I have the following code that gets called from a Controller.
public async Task Execute()
{
var collections= await _repo.GetCollections(); // This gets 500+ items
List<Object1> coolCollections= new List<Object1>();
List<Object2> uncoolCollections= new List<Object2>();
foreach (var collection in collections)
{
if(collection == "Something")
{
var specialObject = TurnObjectIntoSpecialObject(collection);
uncoolCollections.Add(specialObject);
}
else
{
var anotherObject = TurnObjectIntoAnotherObject(object);
coolCollections.Add(anotherObject);
}
}
var list1Async = coolCollections.Select(async obj => await restService.PostObject1(obj)); //each call takes 200 -> 2000ms
var list2Async = uncoolCollections.Select(async obj => await restService.PostObject2(obj));//each call takes 300 -> 3000ms
var asyncTasks = list1Async.Concat<Task>(list2Async);
await Task.WhenAll(asyncTasks); //As 500+ 'tasks'
}
Unfortunately, I'm getting a 504 error after around 300 or so requests. I can't change the API the RestService calls so I'm stuck trying to make the above code more performant.
Changing Task.WhenAll to a foreach loop does work, and does resolve the time out but it's very slow.
My question is how can I make sure the above code does not timeout after x number of requests?
Making more concurrent calls to a remote site or database doesn't improve throughput, quite the opposite. Conflicts between the concurrent operations mean that beyond a certain point everything will start taking more time, until the service crashes. Right now you have a possible 300-way blocking problem.
The way all services handle this is by restricting the number of concurrent connections. In fact, many services will throttle clients so they don't crash if someone ... sends 500 concurrent requests. Some may even tell you they're throttling you with a 429 response.
Another way is to use batch requests, so instead of making 300 or 500 calls you can send a batch of 500 operations, allowing the service to handle them in an efficient way.
You can use Parallel.ForEachAsync to execute multiple calls with a specified degree of parallelism :
ParallelOptions parallelOptions = new()
{
MaxDegreeOfParallelism = 30
};
await Parallel.ForEachAsync(object1, parallelOptions, async obj =>
{
await restService.PostObject1(obj);
});
await Parallel.ForEachAsync(object2, parallelOptions, async obj =>
{
await restService.PostObject2(obj);
});
You can adjust the DOP to find what works best without slowing down the remote service.
If the service can handle it, you could start both pipelines concurrently and await both of them to complete:
ParallelOptions parallelOptions = new()
{
MaxDegreeOfParallelism = 30
};
var task1=Parallel.ForEachAsync(object1, parallelOptions, async obj =>
{
await restService.PostObject1(obj);
});
var task2=Parallel.ForEachAsync(object2, parallelOptions, async obj =>
{
await restService.PostObject2(obj);
});
await Task.WhenAll(task1,task2);
It doesn't make sense to retry those operations before you limit the DOP. Retrying 500 failed requests will only lead to another failure. Retrying with a random or staggered delay is essentially the same as limiting the DOP from the start, except it takes far longer to complete.
Since you are unable to change the rest service the 504 gateway timeout will remain.
A better solution would be to use a retry mechanism, if you receive a 504 error code then you'll retry after 'x' seconds.
Why retry after 'x' seconds?
The reasons for the 504 could be many, it could be that the server does not handle more request because it is at it's maximum workload at the moment.
A good and battle-tested library for retry mechanism is Polly.
You could also write your own function or action depending on the return type.
Depending on the happyflow use-case other methods could be used, but in this situation I went with the thought of you wanting to upload all the data even if an exception occurs.
Something to think about, if this service is provided by a third party vendor then look into the documentation. It will most likely have a section on max concurrent connections to the service.
This answer assumes that you are using .NET 6 or later. You could project the objects to an enumerable of object elements, and then parallelize the processing of the projected objects with the Parallel.ForEachAsync method. This method allows to configure the MaxDegreeOfParallelism, so that not all projected objects are processed at once. As a result the remote server will not be bombarded with more requests than it can handle, nor the network bandwidth will be saturated. The optimal MaxDegreeOfParallelism can be found by experimentation. Start with a small number, like 5, and then gradually increase it until you find the sweet spot that offers the best performance.
public async Task Execute()
{
var objects = await _repo.GetObjects();
IEnumerable<object> projected = objects.Select(obj =>
{
if (obj == "Something")
{
return (object)TurnObjectIntoSpecialObject(obj);
}
else
{
return (object)TurnObjectIntoAnotherObject(obj);
}
});
ParallelOptions options = new()
{
MaxDegreeOfParallelism = 5
};
await Parallel.ForEachAsync(projected, options, async (item, ct) =>
{
switch (item)
{
case Object1 obj1: await restService.PostObject1(obj1); break;
case Object2 obj2: await restService.PostObject2(obj2); break;
default: throw new NotImplementedException();
}
});
}
The above code processes the objects in the same order that appear in the objects sequence. The PostObject1/PostObject2 operations are parallelized, but the TurnObjectIntoSpecialObject/TurnObjectIntoAnotherObject operations are not. If you want to parallelize these too, then you can feed the Parallel.ForEachAsync with the objects, and do the projection inside the parallel loop.
In case of errors, only the first exception will be propagated. If you want to
propagate all the errors, you can find solutions here.
Related
I have an API which needs to be run in a loop for Mass processing.
Current single API is:
public async Task<ActionResult<CombinedAddressResponse>> GetCombinedAddress(AddressRequestDto request)
We are not allowed to touch/modify the original single API. However can be run in bulk, using foreach statement. What is the best way to run this asychronously without locks?
Current Solution below is just providing a list, would this be it?
public async Task<ActionResult<List<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
var combinedAddressResponses = new List<CombinedAddressResponse>();
foreach(AddressRequestDto request in requests)
{
var newCombinedAddress = (await GetCombinedAddress(request)).Value;
combinedAddressResponses.Add(newCombinedAddress);
}
return combinedAddressResponses;
}
Update:
In debugger, it has to go to combinedAddressResponse.Result.Value
combinedAddressResponse.Value = null
and Also strangely, writing combinedAddressResponse.Result.Value gives error below "Action Result does not contain a definition for for 'Value' and no accessible extension method
I'm writing this code off the top of my head without an IDE or sleep, so please comment if I'm missing something or there's a better way.
But effectively I think you want to run all your requests at once (not sequentially) doing something like this:
public async Task<ActionResult<List<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
var combinedAddressResponses = new List<CombinedAddressResponse>(requests.Count);
var tasks = new List<Task<ActionResult<CombinedAddressResponse>>(requests.Count);
foreach (var request in requests)
{
tasks.Add(Task.Run(async () => await GetCombinedAddress(request));
}
//This waits for all the tasks to complete
await tasks.WhenAll(tasks.ToArray());
combinedAddressResponses.AddRange(tasks.Select(x => x.Result.Value));
return combinedAddressResponses;
}
looking for a way to speed things up and run in parallel thanks
What you need is "asynchronous concurrency". I use the term "concurrency" to mean "doing more than one thing at a time", and "parallel" to mean "doing more than one thing at a time using threads". Since you're on ASP.NET, you don't want to use additional threads; you'd want to use a form of concurrency that works asynchronously (which uses fewer threads). So, Parallel and Task.Run should not be parts of your solution.
The way to do asynchronous concurrency is to build a collection of tasks, and then use await Task.WhenAll. E.g.:
public async Task<ActionResult<IReadOnlyList<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
// Build the collection of tasks by doing an asynchronous operation for each request.
var tasks = requests.Select(async request =>
{
var combinedAddressResponse = await GetCombinedAdress(request);
return combinedAddressResponse.Value;
}).ToList();
// Wait for all the tasks to complete and get the results.
var results = await Task.WhenAll(tasks);
return results;
}
I'm trying to do a parallel SqlBulkCopy to multiple targets over WAN, many of which may be having slow connections and/or connection cutoffs; their connection speed varies from 2 to 50 mbits download, and I am sending from a connection with 1000 mbit upload; a lot of the targets need multiple retries to correctly finish.
I'm currently using a Parallel.ForEach on the GetConsumingEnumerable() of a BlockingCollection (queue); however I either stumbled upon some bug, or I am having problems fully understanding its purpose, or simply got something wrong..
The code never calls the CompleteAdding() method of the blockingcollection,
it seems that somewhere in the parallel-foreach-loop some of the targets get lost.
Even if there are different approaches to this, and disregarding the kind of work it is doing in the loop, the blockingcollection shouldn't behave the way it does in this example, should it?
In the foreach-loop, I do the work, and add the target to a results-collection in case it completed successfully, or re-add the target to the BlockingCollection in case of an error until the target reached the max retries threshold; at that point I add it to the results-collection.
In an additional Task, I loop until the count of the results-collection equals the initial count of the targets; then I do the CompleteAdding() on the blocking collection.
I already tried using a locking object for the operations on the results-collection (using a List<int> instead) and the queue, with no luck, but that shouldn't be necessary anyways. I also tried adding the retries to a separate collection, and re-adding those to the BlockingCollection in a different Task instead of in the parallel.foreach.
Just for fun I also tried compiling with .NET from 4.5 to 4.8, and different C# language versions.
Here is a simplified example:
List<int> targets = new List<int>();
for (int i = 0; i < 200; i++)
{
targets.Add(0);
}
BlockingCollection<int> queue = new BlockingCollection<int>(new ConcurrentQueue<int>());
ConcurrentBag<int> results = new ConcurrentBag<int>();
targets.ForEach(f => queue.Add(f));
// Bulkcopy in die Filialen:
Task.Run(() =>
{
while (results.Count < targets.Count)
{
Thread.Sleep(2000);
Console.WriteLine($"Completed: {results.Count} / {targets.Count} | queue: {queue.Count}");
}
queue.CompleteAdding();
});
int MAX_RETRIES = 10;
ParallelOptions options = new ParallelOptions { MaxDegreeOfParallelism = 50 };
Parallel.ForEach(queue.GetConsumingEnumerable(), options, target =>
{
try
{
// simulate a problem with the bulkcopy:
throw new Exception();
results.Add(target);
}
catch (Exception)
{
if (target < MAX_RETRIES)
{
target++;
if (!queue.TryAdd(target))
Console.WriteLine($"{target.ToString("D3")}: Error, can't add to queue!");
}
else
{
results.Add(target);
Console.WriteLine($"Aborted after {target + 1} tries | {results.Count} / {targets.Count} items finished.");
}
}
});
I expected the count of the results-collection to be the exact count of the targets-list in the end, but it seems to never reach that number, which results in the BlockingCollection never being marked as completed, so the code never finishes.
I really don't understand why not all of the targets get added to the results-collection eventually! The added count always varies, and is mostly just shy of the expected final count.
EDIT: I removed the retry-part, and replaced the ConcurrentBag with a simple int-counter, and it still doesn't work most of the time:
List<int> targets = new List<int>();
for (int i = 0; i < 500; i++)
targets.Add(0);
BlockingCollection<int> queue = new BlockingCollection<int>(new ConcurrentQueue<int>());
//ConcurrentBag<int> results = new ConcurrentBag<int>();
int completed = 0;
targets.ForEach(f => queue.Add(f));
var thread = new Thread(() =>
{
while (completed < targets.Count)
{
Thread.Sleep(2000);
Console.WriteLine($"Completed: {completed} / {targets.Count} | queue: {queue.Count}");
}
queue.CompleteAdding();
});
thread.Start();
ParallelOptions options = new ParallelOptions { MaxDegreeOfParallelism = 4 };
Parallel.ForEach(queue.GetConsumingEnumerable(), options, target =>
{
Interlocked.Increment(ref completed);
});
Sorry, found the answer: the default partitioner used by blockingcollection and parallel foreach is chunking and buffering, which results in the foreach loop to forever wait for enough items for the next chunk.. for me, it sat there for a whole night, without processing the last few items!
So, instead of:
ParallelOptions options = new ParallelOptions { MaxDegreeOfParallelism = 4 };
Parallel.ForEach(queue.GetConsumingEnumerable(), options, target =>
{
Interlocked.Increment(ref completed);
});
you have to use:
var partitioner = Partitioner.Create(queue.GetConsumingEnumerable(), EnumerablePartitionerOptions.NoBuffering);
ParallelOptions options = new ParallelOptions { MaxDegreeOfParallelism = 4 };
Parallel.ForEach(partitioner, options, target =>
{
Interlocked.Increment(ref completed);
});
Parallel.ForEach is meant for data parallelism (ie processing 100K rows using all 8 cores), not concurrent operations. This is essentially a pub/sub and async problem, if not a pipeline problem. There's nothing for the CPU to do in this case, just start the async operations and wait for them to complete.
.NET handles this since .NET 4.5 through the Dataflow classes and lately, the lower-level System.Threading.Channel namespace.
In its simplest form, you can create an ActionBlock<> that takes a buffer and target connection and publishes the data. Let's say you use this method to send the data to a server :
async Task MyBulkCopyMethod(string connectionString,DataTable data)
{
using(var bcp=new SqlBulkCopy(connectionString))
{
//Set up mappings etc.
//....
await bcp.WriteToServerAsync(data);
}
}
You can use this with an ActionBlock class with a configured degree of parallelism. Dataflow classes like ActionBlock have their own input, and where appropriate, output buffers, so there's no need to create a separate queue :
class DataMessage
{
public string Connection{get;set;}
public DataTable Data {get;set;}
}
...
var options=new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism = 50,
BoundedCapacity = 8
};
var block=new ActionBlock<DataMessage>(msg=>MyBulkCopyMethod(msg.Connection,msg.Data, options);
We can start posting messages to the block now. By setting the capacity to 8 we ensure the input buffer won't get filled with large messages if the block is too slow. MaxDegreeOfParallelism controls how may operations run concurrently. Let's say we want to send the same data to many servers :
var data=.....;
var servers=new[]{connString1, connString2,....};
var messages= from sv in servers
select new DataMessage{ ConnectionString=sv,Table=data};
foreach(var msg in messages)
{
await block.SendAsync(msg);
}
//Tell the block we are done
block.Complete();
//Await for all messages to finish processing
await block.Completion;
Retries
One possibility for retries is to use a retry loop in the worker function. A better idea would be to use a different block and post failed messages there.
var block=new ActionBlock<DataMessage>(async msg=> {
try {
await MyBulkCopyMethod(msg.Connection,msg.Data, options);
}
catch(SqlException exc) when (some retry condition)
{
//Post without awaiting
retryBlock.Post(msg);
});
When the original block completes we want to tell the retry block to complete as well, no matter what :
block.Completion.ContinueWith(_=>retryBlock.Complete());
Now we can await the retryBlock to complete.
That block could have a smaller DOP and perhaps a delay between attempts :
var retryOptions=new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism = 5
};
var retryBlock=new ActionBlock<DataMessage>(async msg=>{
await Task.Delay(1000);
try {
await MyBulkCopyMethod(msg.Connection,msg.Data, options);
}
catch (Exception ....)
{
...
}
});
This pattern can be repeated to create multiple levels of retry, or different conditions. It can also be used to create different priority workers by giving a larger DOP to high priority workers, or a larger delay to low priority workers
I'm implementing simple data loader over HTTP, following tips from my previous question C# .NET Parallel I/O operation (with throttling), answered by Throttling asynchronous tasks.
I split loading and deserialization, assuming that one may be slower/faster than other. Also I want to throttle downloading, but don't want to throttle deserialization. Therefore I'm using two blocks and one buffer.
Unfortunately I'm facing problem that this pipeline sometimes processes less messages than consumed (I know from target server that I did exactly n requests, but I end up with less responses).
My method looks like this (no error handling):
public async Task<IEnumerable<DummyData>> LoadAsync(IEnumerable<Uri> uris)
{
IList<DummyData> result;
using (var client = new HttpClient())
{
var buffer = new BufferBlock<DummyData>();
var downloader = new TransformBlock<Uri, string>(
async u => await client.GetStringAsync(u),
new ExecutionDataflowBlockOptions
{ MaxDegreeOfParallelism = _maxParallelism });
var deserializer =
new TransformBlock<string, DummyData>(
s => JsonConvert.DeserializeObject<DummyData>(s),
new ExecutionDataflowBlockOptions
{ MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });
var linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
downloader.LinkTo(deserializer, linkOptions);
deserializer.LinkTo(buffer, linkOptions);
foreach (Uri uri in uris)
{
await downloader.SendAsync(uri);
}
downloader.Complete();
await downloader.Completion;
buffer.TryReceiveAll(out result);
}
return result;
}
So to be more specific, I have 100 URLs to load, but I get 90-99 responses. No error & server handled 100 requests. This happens randomly, most of the time code behaves correctly.
There are three issues with your code:
Awaiting for the completion of the first block of the pipeline (downloader) instead of the last (buffer).
Using the TryReceiveAll method for retrieving the messages of the buffer block. The correct way to retrieve all messages from an unlinked block without introducing race conditions is to use the methods OutputAvailableAsync and TryReceive in a nested loop. You can find examples here and here.
In case of timeout the HttpClient throws an unexpected TaskCanceledException, and the TPL Dataflow blocks happens to ignore exceptions of this type. The combination of these two unfortunate realities means that by default any timeout occurrence will remain unobserved. To fix this problem you could change your code like this:
var downloader = new TransformBlock<Uri, string>(async url =>
{
try
{
return await client.GetStringAsync(url);
}
catch (OperationCanceledException) { throw new TimeoutException(); }
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = _maxParallelism });
A fourth unrelated issue is the use of the MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded option for the deserializer block. In the (hopefully unlikely) case that the deserializer is slower than the downloader, the deserializer will start queuing more and more work on the ThreadPool, keeping it permanently starved. This will not be good for the performance and the responsiveness of your application, or for the health of the system as a whole. In practice there is rarely a reason to configure a CPU-bound block with a MaxDegreeOfParallelism larger than Environment.ProcessorCount.
I want to run all tasks with WhenAll (not one by one).
But after that I need to update list (LastReport property) base on result.
I think I have solution but I would like to check if there is better way.
Idea is to:
Run all tasks
Remember relation between configuration and task
Update configuration
My solution is:
var lastReportAllTasks = new List<Task<Dictionary<string, string>>>();
var configurationTaskRelation = new Dictionary<int, Task<Dictionary<string, string>>>();
foreach (var configuration in MachineConfigurations)
{
var task = machineService.GetReports(configuration);
lastReportAllTasks.Add(task);
configurationTaskRelation.Add(configuration.Id, task);
}
await Task.WhenAll(lastReportAllTasks);
foreach (var configuration in MachineConfigurations)
{
var lastReportTask = configurationTaskRelation[configuration.Id];
configuration.LastReport = await lastReportTask;
}
The Select function can be asynchronous itself. You can await the report and return both the configuration and result in the same result object (anonymous type or tuple, whatever you prefer) :
var tasks=MachineConfigurations.Select(async conf=>{
var report= await machineService.GetReports(conf);
return new {conf,report});
var results=await Task.WhenAll(tasks);
foreach(var pair in results)
{
pair.conf.LastReport=pair.report;
}
EDIT - Loops and error handling
As Servy suggested, Task.WhenAll can be ommited and awaiting can be moved inside the loop :
foreach(var task in tasks)
{
var pair=await task;
pair.conf.LastReport=pair.report;
}
The tasks will still execute concurrently. In case of exception though, some configuration objects will be modified and some not.
In general, this would be an ugly situation, requiring extra exception handling code to clean up the modified objects. Exception handling is a lot easier when modifications are done on-the-side and finalized/applied when the happy path completes. That's one reason why updating the Configuration objects inside the Select() requires careful consideration.
In this particular case though it may be better to "skip" the failed reports, possibly move them to an error queue and reprocess them at a later time. It may be better to have partial results than no results at all, as long as this behaviour is expected:
foreach(var task in tasks)
{
try
{
var pair=await task;
pair.conf.LastReport=pair.report;
}
catch(Exception exc)
{
//Make sure the error is logged
Log.Error(exc);
ErrorQueue.Enqueue(new ProcessingError(conf,ex);
}
}
//Handle errors after the loop
EDIT 2 - Dataflow
For completeness, I do have several thousand ticket reports to generate each day, and each GDS call (the service through which every travel agency sells tickets) takes considerable time. I can't run all requests at the same time - I start getting server serialization errors if I try more than 10 concurrent requests. I can't retry everything either.
In this case I used TPL DataFlow combined with some Railway oriented programming tricks. An ActionBlock with a DOP of 8 processes the ticket requests. The results are wrapped in a Success class and sent to the next block. Failed requests and exceptions are wrapped in a Failure class and sent to another block. Both classes inherit from IFlowEnvelope which has a Successful flag. Yes, that's F# Discriminated Union envy.
This is combined with some retry logic for timeouts etc.
In pseudocode the pipeline looks like this :
var reportingBlock=new TransformBlock<Ticket,IFlowEnvelope<TicketReport>(reportFunc,dopOptions);
var happyBlock = new ActionBlock<IFlowEnvelope<TicketReport>>(storeToDb);
var errorBlock = new ActionBlock<IFlowEnvelope<TicketReport>>(logError);
reportingBlock.LinkTo(happyBlock,linkOptions,msg=>msg.Success);
reportingBlock.LinkTo(errorBlock,linkOptions,msg=>!msg.Success);
foreach(var ticket in tickets)
{
reportingBlock.Post(ticket);
}
reportFunc catches any exceptions and wraps them as Failure<T> objects:
async Task<IFlowEnvelope<Ticket,TicketReport>> reportFunc(Ticket ticket)
{
try
{
//Do the heavy processing
return new Success<TicketReport>(report);
}
catch(Exception exc)
{
//Construct an error message, msg
return new Failure<TicketReport>(report,msg);
}
}
The real pipeline includes steps that parse daily reports and individual tickets. Each call to the GDS takes 1-6 seconds so the complexity of the pipeline is justified.
I think you don't need Lists or Dictionaries. Why not simple loop which updates LastReport with results
foreach (var configuration in MachineConfigurations)
{
configuration.LastReport = await machineService.GetReports(configuration);
}
For executing all reports "in parallel"
Func<Configuration, Task> loadReport =
async config => config.LastReport = await machineService.GetReports(config);
await Task.WhenAll(MachineConfigurations.Select(loadReport));
And very poor try to be more functional.
Func<Configuration, Task<Configuration>> getConfigWithReportAsync =
async config =>
{
var report = await machineService.GetReports(config);
return new Configuration
{
Id = config.Id,
LastReport = report
};
}
var configsWithUpdatedReports =
await Task.WhenAll(MachineConfigurations.Select(getConfigWithReportAsync));
using System.Linq;
var taskResultsWithConfiguration = MachineConfigurations.Select(conf =>
new { Conf = conf, Task = machineService.GetReports(conf) }).ToList();
await Task.WhenAll(taskResultsWithConfiguration.Select(pair => pair.Task));
foreach (var pair in taskResultsWithConfiguration)
pair.Conf.LastReport = pair.Task.Result;
I have a C# requirement for individually processing a 'great many' (perhaps > 100,000) records. Running this process sequentially is proving to be very slow with each record taking a good second or so to complete (with a timeout error set at 5 seconds).
I would like to try running these tasks asynchronously by using a set number of worker 'threads' (I use the term 'thread' here cautiously as I am not sure if I should be looking at a thread, or a task or something else).
I have looked at the ThreadPool, but I can't imagine it could queue the volume of requests required. My ideal pseudo code would look something like this...
public void ProcessRecords() {
SetMaxNumberOfThreads(20);
MyRecord rec;
while ((rec = GetNextRecord()) != null) {
var task = WaitForNextAvailableThreadFromPool(ProcessRecord(rec));
task.Start()
}
}
I will also need a mechanism that the processing method can report back to the parent/calling class.
Can anyone point me in the right direction with perhaps some example code?
A possible simple solution would be to use a TPL Dataflow block which is a higher abstraction over the TPL with configurations for degree of parallelism and so forth. You simply create the block (ActionBlock in this case), Post everything to it, wait asynchronously for completion and TPL Dataflow handles all the rest for you:
var block = new ActionBlock<MyRecord>(
rec => ProcessRecord(rec),
new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 20});
MyRecord rec;
while ((rec = GetNextRecord()) != null)
{
block.Post(rec);
}
block.Complete();
await block.Completion
Another benefit is that the block starts working as soon as the first record arrives and not only when all the records have been received.
If you need to report back on each record you can use a TransformBlock to do the actual processing and link an ActionBlock to it that does the updates:
var transform = new TransfromBlock<MyRecord, Report>(rec =>
{
ProcessRecord(rec);
return GenerateReport(rec);
}, new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 20});
var reporter = new ActionBlock<Report>(report =>
{
RaiseEvent(report) // Or any other mechanism...
});
transform.LinkTo(reporter, new DataflowLinkOptions { PropagateCompletion = true });
MyRecord rec;
while ((rec = GetNextRecord()) != null)
{
transform.Post(rec);
}
transform.Complete();
await transform.Completion
Have you thought about using parallel processing with Actions?
ie, create a method to process a single record, add each record method as an action into a list, and then perform a parrallel.for on the list.
Dim list As New List(Of Action)
list.Add(New Action(Sub() MyMethod(myParameter)))
Parallel.ForEach(list, Sub(t) t.Invoke())
This is in vb.net, but I think you get the gist.