Orleans slow with minimalistic use case - c#

I'm evaluating Orleans for a new project we are starting soon.
Eventually we want to run a bunch of persistent actors, but I'm currently struggling to just get base line in memory version of orleans to be performant.
Given the following grain
using Common.UserWallet;
using Common.UserWallet.Messages;
using Microsoft.Extensions.Logging;
namespace Grains;
public class UserWalletGrain : Orleans.Grain, IUserWalletGrain
{
private readonly ILogger _logger;
public UserWalletGrain(ILogger<UserWalletGrain> logger)
{
_logger = logger;
}
public async Task<CreateOrderResponse> CreateOrder(CreateOrderCommand command)
{
return new CreateOrderResponse(Guid.NewGuid());
}
public Task Ping()
{
return Task.CompletedTask;
}
}
The following silo config:
static async Task<IHost> StartSiloAsync()
{
ServicePointManager.UseNagleAlgorithm = false;
var builder = new HostBuilder()
.UseOrleans(c =>
{
c.UseLocalhostClustering()
.Configure<ClusterOptions>(options =>
{
options.ClusterId = "dev";
options.ServiceId = "OrleansBasics";
})
.ConfigureApplicationParts(
parts => parts.AddApplicationPart(typeof(HelloGrain).Assembly).WithReferences())
.AddMemoryGrainStorage("OrleansMemoryProvider");
});
var host = builder.Build();
await host.StartAsync();
return host;
}
And the following client code:
static async Task<IClusterClient> ConnectClientAsync()
{
var client = new ClientBuilder()
.UseLocalhostClustering()
.Configure<ClusterOptions>(options =>
{
options.ClusterId = "dev";
options.ServiceId = "OrleansBasics";
})
//.ConfigureLogging(logging => logging.AddConsole())
.Build();
await client.Connect();
Console.WriteLine("Client successfully connected to silo host \n");
return client;
}
static async Task DoClientWorkAsync(IClusterClient client)
{
List<IUserWalletGrain> grains = new List<IUserWalletGrain>();
foreach (var _ in Enumerable.Range(1, 100))
{
var walletGrain = client.GetGrain<IUserWalletGrain>(Guid.NewGuid());
await walletGrain.Ping(); //make sure grain is loaded
grains.Add(walletGrain);
}
var sw = Stopwatch.StartNew();
await Parallel.ForEachAsync(Enumerable.Range(1, 100000), async (o, token) =>
{
var command = new Common.UserWallet.Messages.CreateOrderCommand(Guid.NewGuid(), 4, 5, new List<Guid> { Guid.NewGuid(), Guid.NewGuid() });
var response = await grains[o % 100].CreateOrder(command);
Console.WriteLine($"{o%10}:{o}");
});
Console.WriteLine($"\nElapsed:{sw.ElapsedMilliseconds}\n\n");
}
I'm able to send 100,000 msg in 30 seconds. Which amount to about 3333 msgs per second. This is way less than I would expect when looking at (https://github.com/yevhen/Orleans.PingPong)
It also does not seem to matter if I start of with 10 grains, 100 grains, or 1000 grains.
When I then add persistence with table storage configured
.AddAzureTableGrainStorage(
name: "OrleansMemoryProvider",
configureOptions: options =>
{
options.UseJson = true;
options.ConfigureTableServiceClient(
"secret);
})
And a single
await WriteStateAsync(); in CreateOrder things get drastically worse at about 280 msgs / s
When I go a bit further and implement some basic domain logic. Calling other actors etc we essentially grind to a snails pace at 1.2 msgs / s
What gives?
EDIT:
My cpu is at about 50%.

Building high performance applications can be tricky and nuanced. The general solution in Orleans is that you have many grains and many callers, so you can achieve a high degree of concurrency and thus throughput. In your case, you have many grains (100), but you have few callers (I believe it's one per core by default with Parallel.ForEachAsync), and each caller is writing to the console after every call, which will slow things down substantially.
If I remove the Console.WriteLine and run your code on my machine using Orleans 7.0-rc2, the 100K calls to 100 grains finish in about 850ms. If I change the CreateOrderRequest & CreateOrderResponse types from classes to structs, the duration decreases to 750ms.
If I run a more optimized ping test (the one from the Orleans repository), I see approximately 550K requests per second on my machine with one client and one silo process sharing the same CPU. The numbers are approximately half this for Orleans 3.x. If I co-host the client within the silo process (i.e, pull IClusterClient from the silo's IServiceProvider) then I see over 5M requests per second.
Once you start doing non-trivial amounts of work in each of your grains, you're going to start running up against other limits. I tested calling a single grain from within the same process recently and found that one grain can handle 500K RPS if it is doing trivial work (ping-pong). If the grain has to write to storage on every request and each storage write takes 1ms then it will not be able to handle more than 1000 RPS, since each call waits for the previous call to finish by default. If you want to opt out of that behavior, you can do so by enabling reentrancy on your grain as described in the documentation here: https://learn.microsoft.com/en-us/dotnet/orleans/grains/reentrancy. The Chirper example has more details on how to implement reentrancy with storage updates: https://github.com/dotnet/orleans/tree/main/samples/Chirper.
When grain methods become more complex and grains need to perform significant amounts of I/O to serve each request (for example, storage updates and subsequent grain calls), the throughput of each individual grain will decrease since each request involves more work. Hopefully, the above numbers give you an approximate guide.

Related

Task.WhenAll timing out after x number of Tasks

I have the following code that gets called from a Controller.
public async Task Execute()
{
var collections= await _repo.GetCollections(); // This gets 500+ items
List<Object1> coolCollections= new List<Object1>();
List<Object2> uncoolCollections= new List<Object2>();
foreach (var collection in collections)
{
if(collection == "Something")
{
var specialObject = TurnObjectIntoSpecialObject(collection);
uncoolCollections.Add(specialObject);
}
else
{
var anotherObject = TurnObjectIntoAnotherObject(object);
coolCollections.Add(anotherObject);
}
}
var list1Async = coolCollections.Select(async obj => await restService.PostObject1(obj)); //each call takes 200 -> 2000ms
var list2Async = uncoolCollections.Select(async obj => await restService.PostObject2(obj));//each call takes 300 -> 3000ms
var asyncTasks = list1Async.Concat<Task>(list2Async);
await Task.WhenAll(asyncTasks); //As 500+ 'tasks'
}
Unfortunately, I'm getting a 504 error after around 300 or so requests. I can't change the API the RestService calls so I'm stuck trying to make the above code more performant.
Changing Task.WhenAll to a foreach loop does work, and does resolve the time out but it's very slow.
My question is how can I make sure the above code does not timeout after x number of requests?
Making more concurrent calls to a remote site or database doesn't improve throughput, quite the opposite. Conflicts between the concurrent operations mean that beyond a certain point everything will start taking more time, until the service crashes. Right now you have a possible 300-way blocking problem.
The way all services handle this is by restricting the number of concurrent connections. In fact, many services will throttle clients so they don't crash if someone ... sends 500 concurrent requests. Some may even tell you they're throttling you with a 429 response.
Another way is to use batch requests, so instead of making 300 or 500 calls you can send a batch of 500 operations, allowing the service to handle them in an efficient way.
You can use Parallel.ForEachAsync to execute multiple calls with a specified degree of parallelism :
ParallelOptions parallelOptions = new()
{
MaxDegreeOfParallelism = 30
};
await Parallel.ForEachAsync(object1, parallelOptions, async obj =>
{
await restService.PostObject1(obj);
});
await Parallel.ForEachAsync(object2, parallelOptions, async obj =>
{
await restService.PostObject2(obj);
});
You can adjust the DOP to find what works best without slowing down the remote service.
If the service can handle it, you could start both pipelines concurrently and await both of them to complete:
ParallelOptions parallelOptions = new()
{
MaxDegreeOfParallelism = 30
};
var task1=Parallel.ForEachAsync(object1, parallelOptions, async obj =>
{
await restService.PostObject1(obj);
});
var task2=Parallel.ForEachAsync(object2, parallelOptions, async obj =>
{
await restService.PostObject2(obj);
});
await Task.WhenAll(task1,task2);
It doesn't make sense to retry those operations before you limit the DOP. Retrying 500 failed requests will only lead to another failure. Retrying with a random or staggered delay is essentially the same as limiting the DOP from the start, except it takes far longer to complete.
Since you are unable to change the rest service the 504 gateway timeout will remain.
A better solution would be to use a retry mechanism, if you receive a 504 error code then you'll retry after 'x' seconds.
Why retry after 'x' seconds?
The reasons for the 504 could be many, it could be that the server does not handle more request because it is at it's maximum workload at the moment.
A good and battle-tested library for retry mechanism is Polly.
You could also write your own function or action depending on the return type.
Depending on the happyflow use-case other methods could be used, but in this situation I went with the thought of you wanting to upload all the data even if an exception occurs.
Something to think about, if this service is provided by a third party vendor then look into the documentation. It will most likely have a section on max concurrent connections to the service.
This answer assumes that you are using .NET 6 or later. You could project the objects to an enumerable of object elements, and then parallelize the processing of the projected objects with the Parallel.ForEachAsync method. This method allows to configure the MaxDegreeOfParallelism, so that not all projected objects are processed at once. As a result the remote server will not be bombarded with more requests than it can handle, nor the network bandwidth will be saturated. The optimal MaxDegreeOfParallelism can be found by experimentation. Start with a small number, like 5, and then gradually increase it until you find the sweet spot that offers the best performance.
public async Task Execute()
{
var objects = await _repo.GetObjects();
IEnumerable<object> projected = objects.Select(obj =>
{
if (obj == "Something")
{
return (object)TurnObjectIntoSpecialObject(obj);
}
else
{
return (object)TurnObjectIntoAnotherObject(obj);
}
});
ParallelOptions options = new()
{
MaxDegreeOfParallelism = 5
};
await Parallel.ForEachAsync(projected, options, async (item, ct) =>
{
switch (item)
{
case Object1 obj1: await restService.PostObject1(obj1); break;
case Object2 obj2: await restService.PostObject2(obj2); break;
default: throw new NotImplementedException();
}
});
}
The above code processes the objects in the same order that appear in the objects sequence. The PostObject1/PostObject2 operations are parallelized, but the TurnObjectIntoSpecialObject/TurnObjectIntoAnotherObject operations are not. If you want to parallelize these too, then you can feed the Parallel.ForEachAsync with the objects, and do the projection inside the parallel loop.
In case of errors, only the first exception will be propagated. If you want to
propagate all the errors, you can find solutions here.

.Net5 HttpClient concurrency - performance

Created a HttpClient using IHttpClientFactory and send 1000 GET call in parallel to WebApi and observed the delay of about 3-5mins for each request.. once this is completed after this again send 1000 GET requests in parallel, this time there was no delay.
Now I increased the parallel request to 2000, for the first batch, each request delay was about 9-11min. And for the second 2000 parallel requests, for each request delay was ~5min(which in case of 1000 requests there was no delay.)
var client = _clientFactory.CreateClient();
client.BaseAddress = new Uri("http://localhost:5000");
client.Timeout = TimeSpan.FromMinutes(20);
List<Task> _task = new List<Task>();
for (int i = 1; i <= 4000; i++)
{
_task.Add(ExecuteRequest(client, i));
if (i % 2000 == 0)
{
await Task.WhenAll(_task);
_task.Clear();
}
}
private async Task ExecuteRequest(HttpClient client, int requestId)
{
var result = await client.GetAsync($"Performance/{requestId}");
var response = await result.Content.ReadAsStringAsync();
var data = JsonConvert.DeserializeObject<Response>(response);
}
Trying to understand,
how many parallel request does HttpClient supports without delay.
How to improve performance of HttpClient for 2000 or more parallel requests..
how many parallel request does HttpClient supports without delay.
On modern .NET Core platforms, you're limited only by available memory. There's no built-in throttling that's on by default.
How to improve performance of HttpClient for 2000 or more parallel requests.
It sounds like you're being throttled by your server. If you want to test a more scalable server, try running this in your server's startup:
var desiredThreads = 2000;
ThreadPool.GetMaxThreads(out _, out var maxIoThreads);
ThreadPool.SetMaxThreads(desiredThreads, maxIoThreads);
ThreadPool.GetMinThreads(out _, out var minIoThreads);
ThreadPool.SetMinThreads(desiredThreads, minIoThreads);
What you're doing is causing worst-case perf for a "cold" (just newed up or empty connection pool) HttpClient.
When you make a new request, it looks for an open connection in the connection pool. When it doesn't find one, it tries to open up a new connection. By throwing a sudden burst at a cold client, most calls to SendAsync will end up trying to open a new connection.
This is a problem because a request that needs a new connection will require multiple round-trips to the server, whereas a request on an existing connection will only require a single round-trip. It gets even worse if you use HTTPS. You're heavily dependent on your network latency in this case.
If you are just benchmarking, then you'll want to benchmark steady-state performance, not warmup performance. Benchmark.NET should more or less do this for you.
When you have requests that complete reasonably quick, it can be a lot faster to instead limit your initial concurrency to a smaller percentage of your total requests, and slowly ramp up your connection pool size from there. This allows subsequent requests to re-use connections. What you might try is something like below, which will only allow (rough behavior, not a guarantee) 10 new connections to be opened at once:
var sem = new SemaphoreSlim(10);
var client = new HttpClient();
async Task<HttpResponseMessage> MakeRequestAsync(HttpRequestMessage req)
{
Task t = sem.WaitAsync();
bool openNew = t.IsCompleted;
await t;
try
{
return await client.SendAsync(req);
}
finally
{
sem.Release(openNew ? 2 : 1);
}
}

Parallel.ForEach MaxDegreeOfParallelism Strange Behavior with Increasing "Chunking"

I'm not sure if the title makes sense, it was the best I could come up with, so here's my scenario.
I have an ASP.NET Core app that I'm using more as a shell and for DI configuration. In Startup it adds a bunch of IHostedServices as singletons, along with their dependencies, also as singletons, with minor exceptions for SqlConnection and DbContext which we'll get to later. The hosted services are groups of similar services that:
Listen for incoming reports from GPS devices and put into a listening buffer.
Parse items out of the listening buffer and put into a parsed buffer.
Eventually there's a single service that reads the parsed buffer and actually processes the parsed reports. It does this by passing the report it took out of the buffer to a handler and awaits for it to complete to move to the next. This has worked well for the past year, but it appears we're running into a scalability issue now because its processing one report at a time and the average time to process is 62ms on the server which includes the Dapper trip to the database to get the data needed and the EF Core trip to save changes.
If however the handler decides that a report's information requires triggering background jobs, then I suspect it takes 100ms or more to complete. Over time, the buffer fills up faster than the handler can process to the point of holding 10s if not 100s of thousands of reports until they can be processed. This is an issue because notifications are delayed and because it has the potential for data loss if the buffer is still full by the time the server restarts at midnight.
All that being said, I'm trying to figure out how to make the processing parallel. After lots of experimentation yesterday, I settled on using Parallel.ForEach over the buffer using GetConsumingEnumerable(). This works well, except for a weird behavior I don't know what to do about or even call. As the buffer is filled and the ForEach is iterating over it it will begin to "chunk" the processing into ever increasing multiples of two. The size of the chunking is affected by the MaxDegreeOfParallelism setting. For example (N# = Next # of reports in buffer):
MDP = 1
N3 = 1 at a time
N6 = 2 at a time
N12 = 4 at a time
...
MDP = 2
N6 = 1 at a time
N12 = 2 at a time
N24 = 4 at a time
...
MDP = 4
N12 = 1 at a time
N24 = 2 at a time
N48 = 4 at a time
...
MDP = 8 (my CPU core count)
N24 = 1 at a time
N48 = 2 at a time
N96 = 4 at a time
...
This is arguably worse than the serial execution I have now because by the end of the day it will buffer and wait for, say, half a million reports before actually processing them.
Is there a way to fix this? I'm not very experienced with Parallel.ForEach so from my point of view this is strange behavior. Ultimately I'm looking for a way to parallel process the reports as soon as they are in the buffer, so if there's other ways to accomplish this I'm all ears. This is roughly what I have for the code. The handler that processes the reports does use IServiceProvider to create a scope and get an instance of SqlConnection and DbContext. Thanks in advance for any suggestions!
public sealed class GpsReportService :
IHostedService {
private readonly GpsReportBuffer _buffer;
private readonly Config _config;
private readonly GpsReportHandler _handler;
private readonly ILogger _logger;
public GpsReportService(
GpsReportBuffer buffer,
Config config,
GpsReportHandler handler,
ILogger<GpsReportService> logger) {
_buffer = buffer;
_config = config;
_handler = handler;
_logger = logger;
}
public Task StartAsync(
CancellationToken cancellationToken) {
_logger.LogInformation("GPS Report Service => starting");
Task.Run(Process, cancellationToken).ConfigureAwait(false);// Is ConfigureAwait here correct usage?
_logger.LogInformation("GPS Report Service => started");
return Task.CompletedTask;
}
public Task StopAsync(
CancellationToken cancellationToken) {
_logger.LogInformation("GPS Parsing Service => stopping");
_buffer.CompleteAdding();
_logger.LogInformation("GPS Parsing Service => stopped");
return Task.CompletedTask;
}
// ========================================================================
// Utilities
// ========================================================================
private void Process() {
var options = new ParallelOptions {
MaxDegreeOfParallelism = 8,
CancellationToken = CancellationToken.None
};
Parallel.ForEach(_buffer.GetConsumingEnumerable(), options, async report => {
try {
await _handler.ProcessAsync(report).ConfigureAwait(false);
} catch (Exception e) {
if (_config.IsDevelopment) {
throw;
}
_logger.LogError(e, "GPS Report Service");
}
});
}
private async Task ProcessAsync() {
while (!_buffer.IsCompleted) {
try {
var took = _buffer.TryTake(out var report, 10);
if (!took) {
continue;
}
await _handler.ProcessAsync(report!).ConfigureAwait(false);
} catch (Exception e) {
if (_config.IsDevelopment) {
throw;
}
_logger.LogError(e, "GPS Report Service");
}
}
}
}
public sealed class GpsReportBuffer :
BlockingCollection<GpsReport> {
}
You can't use Parallel methods with async delegates - at least, not yet.
Since you already have a "pipeline" style of architecture, I recommend looking into TPL Dataflow. A single ActionBlock may be all that you need, and once you have that working, other blocks in TPL Dataflow may replace other parts of your pipeline.
If you prefer to stick with your existing buffer, then you should use asynchronous concurrency instead of Parallel:
private void Process() {
var throttler = new SemaphoreSlim(8);
var tasks = _buffer.GetConsumingEnumerable()
.Select(async report =>
{
await throttler.WaitAsync();
try {
await _handler.ProcessAsync(report).ConfigureAwait(false);
} catch (Exception e) {
if (_config.IsDevelopment) {
throw;
}
_logger.LogError(e, "GPS Report Service");
}
finally {
throttler.Release();
}
})
.ToList();
await Task.WhenAll(tasks);
}
You have an event stream processing/dataflow problem, not a parallelism problem. If you use the appropriate classes, like the Dataflow blocks, Channels, or Reactive Extensions the problem is simplified a lot.
Even if you want to use a single buffer and a fat worker method though, the appropriate buffer class is the asynchronous Channel, not BlockingCollection. The code could become as simple as:
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
await foreach(GpsMessage msg in _reader.ReadAllAsync(stopppingToken))
{
await _handler.ProcessAsync(msg);
}
}
The first option shows how to use a Dataflow to create a pipeline. The second, how to use Channel instead of BlockingCollection to process multiple queued items concurrently
A pipeline with Dataflow
Once you break the process into independent methods, it's easy to create a pipeline of processing steps using any library.
Task<IEnumerable<GpsMessage>> Poller(DateTime time,IList<Device> devices,CancellationToken token=default)
{
foreach(var device in devices)
{
if(token.IsCancellationRequested)
{
break;
}
var msg=await device.ReadMessage();
yield return msg;
}
}
GpsReport Parser(GpsMessage msg)
{
//Do some parsing magic.
return report;
}
async Task<GpsReport> Enrich(GpsReport report,string connectionString,CancellationToken token=default)
{
//Depend on connection pooling to eliminate the cost of connections
//We may have to use a pool of opened connections otherwise
using var con=new SqlConnection(connectionString);
var extraData=await con.QueryAsync<Extra>(sql,new {deviceId=report.DeviceId},token);
report.Extra=extraData;
return report;
}
async Task BulkImport(SqlReport[] reports,CancellationToken token=default)
{
using var bcp=new SqlBulkCopy(...);
using var reader=ObjectReader.Create(reports);
...
await bcp.WriteToServerAsync(reader,token);
}
In the BulkImport method I use FasMember's ObjectReader to create an IDataReader wrapper over the reports so I can use them with SqlBulkCopy. Another option would be to convert them to a DataTable, but that would create an extra copy of the data in memory.
Combining all these with Dataflow is relatively easy.
var execOptions=new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
}
_poller = new TransformManyBlock<DateTime,GpsBuffer>(time=>Poller(time,devices));
_parser = new TransformBlock<GpsBuffer,GpsReport>(b=>Parser(b),execOptions);
var enricher = new TransformBlock<GpsReport,GpsReport>(rpt=>Enrich(rpt,connStr),execOptions);
_batch = new BatchBlock<GpsReport>(50);
_bcpBlock = new ActionBlock<GpsReport[]>(reports=>BulkImport(reports));
Each block has an input and output buffer (except ActionBlock). Each block takes care of processing the messages in its input buffer and processes it. By default, each block uses only one worker task, but that can be changed. The message order is maintained, so if we use eg 10 worker tasks for the parser block, the messages will still be emitted in the order they were received.
Next comes linking the blocks.
var linkOptions=new DataflowLinkOptions {PropagateCompletion=true};
_poller.LinkTo(_parser,options);
_parser.LinkTo(_enricher,options);
_enricher.LinkTo(_batch,options);
_batch.LinkTo(_bcpBlock,options);
After that, a timer can be used to "ping" the head block, the poller, whenever we want:
private void Ping(object state)
{
_poller.Post(DateTime.Now);
}
public Task StartAsync(CancellationToken stoppingToken)
{
_logger.LogInformation("Timed Hosted Service running.");
_timer = new Timer(Ping, null, TimeSpan.Zero,
TimeSpan.FromSeconds(5));
return Task.CompletedTask;
}
To stop the pipeline gracefully, we call Complete() on the head block and await the Completion task on the last block. Assuming the hosted service is similar to the timed background service example:
public Task StopAsync(CancellationToken cancellationToken)
{
....
_timer?.Change(Timeout.Infinite, 0);
_poller.Complete();
await _bcpBlock.Completion;
...
}
Using Channel as an Async queue
A Channel is a far better alternative for asynchronous publisher/subscriber scenarios than BlockingCollection. Roughly, it's an asynchronous queue that goes to extremes to prevent the publisher from reading, or the subscriber from writing, by forcing callers to use the ChannelWriter and ChannelReader classes. In fact, it's quite common to only pass those classes around, never the Channel instance itself.
In your publishing code, you can create a Channel<T> and pass its Reader to the GpsReportService service. Let's assume the publisher is another service that implements an IGpsPublisher interface :
public interface IGpsPublisher
{
ChannelReader<GspMessage> Reader{get;}
}
and the implementation
Channel<GpsMessage> _channel=Channel.CreateUnbounded<GpsMessage>();
public ChannelReader<GspMessage> Reader=>_channel;
private async void Ping(object state)
{
foreach(var device in devices)
{
if(token.IsCancellationRequested)
{
break;
}
var msg=await device.ReadMessage();
await _channel.Writer.WriteAsync(msg);
}
}
public Task StartAsync(CancellationToken stoppingToken)
{
_timer = new Timer(Ping, null, TimeSpan.Zero,
TimeSpan.FromSeconds(5));
return Task.CompletedTask;
}
public Task StopAsync(CancellationToken cancellationToken)
{
_timer?.Change(Timeout.Infinite, 0);
_channel.Writer.Complete();
}
This can be passed to GpsReportService as a dependency that will be resolved by the DI container:
public sealed class GpsReportService : BackgroundService
{
private readonly ChannelReader<GpsMessage> _reader;
public GpsReportService(
IGpsPublisher publisher,
...)
{
_reader = publisher.Reader;
...
}
And used
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
await foreach(GpsMessage msg in _reader.ReadAllAsync(stopppingToken))
{
await _handler.ProcessAsync(msg);
}
}
Once the publisher completes, the subscriber loop will also complete once all messages are processed.
To process in parallel, you can start multiple loops concurrently:
async Task Process(ChannelReader<GgpsMessage> reader,CancellationToken token)
{
await foreach(GpsMessage msg in reader.ReadAllAsync(token))
{
await _handler.ProcessAsync(msg);
}
}
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
var tasks=Enumerable.Range(0,10)
.Select(_=>ProcessReader(_reader,stoppingToken))
.ToArray();
await Task.WhenAll(tasks);
}
Explaining the pipeline
I have a similar situation: every 15 minutes I request air ticket sales reports from airlines (actually GDSs), parse them to extract data and ticket numbers, download the ticket record for each ticket to get some extra data and save everything to the database. I have to do that for 20+ cities (ticket reports are per city) with each report having from 10 to over 100K tickets.
This almost begs for a pipeline. Using your example, you can create a pipeline with the following steps/blocks:
Listen for GPS messages and emit the unparsed message.
Parse the message and emit the parsed message
Load any extra data needed per message and emit a combined record
Handle the combined record and emit the result
(Optional) batch results
Save the results to the database
All three options (Dataflow, Channels, Rx) take care of buffering between the steps. Dataflow is a some-assembly-required library for pipelines processing independent events, Rx is ready-made to analyze streams of events where time is important (eg to calculate average speed in a sliding window), Channels is Lego bricks that can do anything but need to be put together.
Why not Parallel.ForEach
Parallel.ForEach is meant for data parallelism, not async operations. It's meant to process large chunks of in-memory data, independent of each other. Amdah's Law explains that parallelization benefits are limited by the synchronous part of an operation, so all data parallelism libraries try to reduce that by partitioning, and using one core/machine/node to process each partition.
Parallel.ForEach also works by partitioning the data and using roughly one worker task per CPU core, to reduce synchronization between cores. It will even use the current thread which leads to the mistaken assumption it's blocking. When all cores are busy, why not use the thread? It won't be able to run anyway.
The Parallel.ForEach employs chunk partitioning by default, which is intended for reducing the synchronization overhead in CPU-intensive applications, but can result to problematic behavior in some usage scenarios. The chunk partitioning can be disabled by passing as argument a Partitioner<T> instead of an IEnumerable<T>:
Parallel.ForEach(Partitioner.Create(_buffer.GetConsumingEnumerable(),
EnumerablePartitionerOptions.NoBuffering), options, ...
You can also find a custom partitioner, tailored specifically for BlockingCollection<T>s, in this article: ParallelExtensionsExtras Tour – #4 – BlockingCollectionExtensions
That said, the Parallel.ForEach is not async-friendly, meaning that it doesn't understand async delegates. The lambda passed is async void, which is something to avoid. So I would recommend using an ActionBlock<T> instead.

Client-side request rate-limiting

I'm designing a .NET client application for an external API. It's going to have two main responsibilities:
Synchronization - making a batch of requests to API and saving responses to my database periodically.
Client - a pass-through for requests to API from users of my client.
Service's documentation specifies following rules on maximum number of requests that can be issued in given period of time:
During a day:
Maximum of 6000 requests per hour (~1.67 per second)
Maximum of 120 requests per minute (2 per second)
Maximum of 3 requests per second
At night:
Maximum of 8000 requests per hour (~2.23 per second)
Maximum of 150 requests per minute (2.5 per second)
Maximum of 3 requests per second
Exceeding these limits won't result in immediate lockdown - no exception will be thrown. But provider can get annoyed, contact us and then ban us from using his service. So I need to have some request delaying mechanism in place to prevent that. Here's how I see it:
public async Task MyMethod(Request request)
{
await _rateLimter.WaitForNextRequest(); // awaitable Task with calculated Delay
await _api.DoAsync(request);
_rateLimiter.AppendRequestCounters();
}
Safest and simpliest option would be to respect the lowest rate limit only, that is of max 3 requests per 2 seconds. But because of "Synchronization" responsibility, there is a need to use as much of these limits as possible.
So next option would be to to add a delay based on current request count. I've tried to do something on my own and I also have used RateLimiter by David Desmaisons, and it would've been fine, but here's a problem:
Assuming there will be 3 requests per second sent by my client to the API at day, we're going to see:
A 20 second delay every 120th request
A ~15 minute delay every 6000th request
This would've been acceptable if my application was only about "Synchronization", but "Client" requests can't wait that long.
I've searched the Web, and I've read about token/leaky bucket and sliding window algorithms, but I couldn't translate them to my case and .NET, since they mainly cover the rejecting of requests that exceed a limit. I've found this repo and that repo, but they are both only service-side solutions.
QoS-like spliting of rates, so that "Synchronization" would have the slower, and "Client" the faster rate, is not an option.
Assuming that current request rates will be measured, how to calculate the delay for next request so that it could be adaptive to current situation, respect all maximum rates and wouldn't be longer than 5 seconds? Something like gradually slowing down when approaching a limit.
This is achievable by using the Library you linked on GitHub. We need to use a composed TimeLimiter made out of 3 CountByIntervalAwaitableConstraint like so:
var hourConstraint = new CountByIntervalAwaitableConstraint(6000, TimeSpan.FromHours(1));
var minuteConstraint = new CountByIntervalAwaitableConstraint(120, TimeSpan.FromMinutes(1))
var secondConstraint = new CountByIntervalAwaitableConstraint(3, TimeSpan.FromSeconds(1));
var timeLimiter = TimeLimiter.Compose(hourConstraint, minuteConstraint, secondConstraint);
We can test to see if this works by doing this:
for (int i = 0; i < 1000; i++)
{
await timeLimiter;
Console.WriteLine($"Iteration {i} at {DateTime.Now:T}");
}
This will run 3 times every second until we reach 120 iterations (iteration 119) and then wait until the minute is over and the continue running 3 times every second. We can also (again using the Library) easily use the TimeLimiter with a HTTP Client by using the AsDelegatingHandler() extension method provided like so:
var handler = TimeLimiter.Compose(hourConstraint, minuteConstraint, secondConstraint);
var client = new HttpClient(handler);
We can also use CancellationTokens, but as far as I can tell not at the same time as also using it as the handler for the HttpClient. Here is how you can use it with a HttpClientanyways:
var timeLimiter = TimeLimiter.Compose(hourConstraint, minuteConstraint, secondConstraint);
var client = new HttpClient();
for (int i = 0; i < 100; i++)
{
await composed.Enqueue(async () =>
{
var client = new HttpClient();
var response = await client.GetAsync("https://hacker-news.firebaseio.com/v0/item/8863.json?print=pretty");
if (response.IsSuccessStatusCode)
Console.WriteLine(await response.Content.ReadAsStringAsync());
else
Console.WriteLine($"Error code {response.StatusCode} reason: {response.ReasonPhrase}");
}, new CancellationTokenSource(TimeSpan.FromSeconds(10)).Token);
}
Edit to address OPs question more:
If you want to make sure a User can send a request without having to wait for the limit to be over with, we would need to dedicate a certain amount of request every second/ minute/ hour to our user. So we need a new TimeLimiter for this and also adjust our API TimeLimiter. Here are the two new ones:
var apiHourConstraint = new CountByIntervalAwaitableConstraint(5500, TimeSpan.FromHours(1));
var apiMinuteConstraint = new CountByIntervalAwaitableConstraint(100, TimeSpan.FromMinutes(1));
var apiSecondConstraint = new CountByIntervalAwaitableConstraint(2, TimeSpan.FromSeconds(1));
// TimeLimiter for calls automatically to the API
var apiTimeLimiter = TimeLimiter.Compose(apiHourConstraint, apiMinuteConstraint, apiSecondConstraint);
var userHourConstraint = new CountByIntervalAwaitableConstraint(500, TimeSpan.FromHours(1));
var userMinuteConstraint = new CountByIntervalAwaitableConstraint(20, TimeSpan.FromMinutes(1));
var userSecondConstraint = new CountByIntervalAwaitableConstraint(1, TimeSpan.FromSeconds(1));
// TimeLimiter for calls made manually by a user to the API
var userTimeLimiter = TimeLimiter.Compose(userHourConstraint, userMinuteConstraint, userSecondConstraint);
You can play around with the numbers to suit your need.
Now to use it:
I saw you're using a central Method to execute your Requests, this makes it easier. I'll just add an optional boolean parameter that determines if it's an automatically executed request or one made from a user. (You could replace this parameter with an Enum if you want more than just automatic and manual requests)
public static async Task DoRequest(Request request, bool manual = false)
{
TimeLimiter limiter;
if (manual)
limiter = TimeLimiterManager.UserLimiter;
else
limiter = TimeLimiterManager.ApiLimiter;
await limiter;
_api.DoAsync(request);
}
static class TimeLimiterManager
{
public static TimeLimiter ApiLimiter { get; }
public static TimeLimiter UserLimiter { get; }
static TimeLimiterManager()
{
var apiHourConstraint = new CountByIntervalAwaitableConstraint(5500, TimeSpan.FromHours(1));
var apiMinuteConstraint = new CountByIntervalAwaitableConstraint(100, TimeSpan.FromMinutes(1));
var apiSecondConstraint = new CountByIntervalAwaitableConstraint(2, TimeSpan.FromSeconds(1));
// TimeLimiter to control access to the API for automatically executed requests
ApiLimiter = TimeLimiter.Compose(apiHourConstraint, apiMinuteConstraint, apiSecondConstraint);
var userHourConstraint = new CountByIntervalAwaitableConstraint(500, TimeSpan.FromHours(1));
var userMinuteConstraint = new CountByIntervalAwaitableConstraint(20, TimeSpan.FromMinutes(1));
var userSecondConstraint = new CountByIntervalAwaitableConstraint(1, TimeSpan.FromSeconds(1));
// TimeLimiter to control access to the API for manually executed requests
UserLimiter = TimeLimiter.Compose(userHourConstraint, userMinuteConstraint, userSecondConstraint);
}
}
This isn't perfect, as when the user doesn't execute 20 API calls every minute but your automated system needs to execute more than 100 every minute it will have to wait.
And regarding day/ night differences: You can use 2 backing fields for the Api/UserLimiter and return the appropriate ones in the { get {...} } of the property

Getting an error “A task was canceled.” when processing a large record set in Parallel.Invoke

I am using Parallel.Invoke to call a large array of Actions on a 4 core machine.
Each action makes a call to an external web api to retrieve a json package of info. That json package is then de-serialized into a series of objects. Each of those objects is then inserted into several tables via EntityFramework 6.
This will process around 2 thousand distinct IDs so I am trying to use the Parallel library to get as fast a through-put as possible.
My main:
private static void Main(string[] args)
{
var apiKey = "myKey";
List<string> caseIDs = new List<string>();
//read list of ids from DB
using (var db = new StagingContext())
{
caseIDs = db.BatchList.Where(b => b.CaseID!=null).Select(a => a.CaseID).Distinct().Take(5000).ToList();
}
List<Action> actions = new List<Action>();
foreach (var id in caseIDs)
{
var UniqueID = Guid.NewGuid();
actions.Add(() => GetRecords(id,"https://myAPIURL/{0}?api={1}&case={2}", apiKey, UniqueID));
}
ParallelOptions op = new ParallelOptions
{
CancellationToken = tok.Token,
MaxDegreeOfParallelism = 10
};
Parallel.Invoke(op, actions.ToArray());
Console.WriteLine("Done");
Console.ReadKey();
}
My action:
private static void GetRecords(string CaseID, string url, string apiKey, Guid UniqueID)
{
using (HttpClient client = new HttpClient())
{
var tmpUrl = string.Format(url, apiKey, CaseID);
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
var result = client.GetAsync(tmpUrl).Result;
var jsonString = result.Content.ReadAsStringAsync();
jsonString.Wait();
var myObjectList = new List<MyObject>();
if (!jsonString.Result.Contains("error"))
{
myObjectList.AddRange(JsonConvert.DeserializeObject<List<MyObject>>(jsonString.Result));
foreach (var item in myObjectList)
{
item.UniqueID = UniqueID;
}
}
//Write this out to DB
using (var db = new StagingContext())
{
var myMappedObjectList = myObjectList.Adapt<List<MyObject>>();
db.CaseAttributeHistories.AddRange(myMappedObjectList);
using (var scope = new TransactionScope(TransactionScopeOption.Required, new TransactionOptions { IsolationLevel = IsolationLevel.ReadUncommitted }))
{
db.SaveChanges();
scope.Complete();
}
}
}
}
When I process a smaller set of data, ~1000 records, it works pretty good. When I process a larger data set , >1400, I often get an
“A task was canceled.”
error.
I am new to the Parallel & multi-threading.
Is this a valid approach?
Is there a good way to track down what is
causing the cancellation?
How would I handle/ignore the error and
continue with the rest of the records?
Is there a better or faster pattern to use in this situation?
First, check for Exceptions. Swallowing a Exception is a deadly sin of exception handling. And unfortunately Multithreading does that fully automatically. Normally you have to write code for that. In mutltithreading you have to write code to avoid it. I would advise those two articles on Exception handling before you try your hand at Multithreading:
http://blogs.msdn.com/b/ericlippert/archive/2008/09/10/vexing-exceptions.aspx
http://www.codeproject.com/Articles/9538/Exception-Handling-Best-Practices-in-NET
Secondly, doing sequential calls to a Web API is generally a bad idea. Please verify that you do not have a way to retrieve the data in bulk, rather then piecemeal. Piecemeal retreival often incurs more overhead then data.
Third, are you even allowed to automate it on that scale? If the APi provider wants no bulk retreival, he might not want automation on that scale. If so he might notice the sudden increase in load and apply some load-throteling later. That could kill your programm.
Fourth, Multithreading a APi call will propably not speed things up. The WEB API and Network will be the bottleneck with a very high propability. Multithreading only helps with CPU bottlenecked operations. With Network, Disk, DB and similar operations, there will be often 0 performance incraese. Or even a performance decrease, as the multiple operations get in each others way.
A bit of Multitasking (even just a single alternate Thread) is mandatory with Network, Disk and similar longrunning opeations. But actuall Multithreading rarely to never helps.
I bet the exception is being thrown from client.GetAsync?
HttpClient will throw TaskCanceledException when the HTTP call times out. (i.e. the web service is not responding)
Annoying, I know.
It's possible that, because you're hitting it so hard, it can't keep up. You can try raising the Timeout property of your HttpClient, but the default is already 100 seconds.
If you want to just ignore those errors, then wrap the client.GetAsync(tmpUrl) in a try/catch block and just return (and maybe log it somewhere).

Categories