Slow down IObserver.OnNext while item handling is in progress [duplicate] - c#

Is there a way in C# rx to handle backpressure?
I'm trying to call a web api from the results of a paged query. This web api is very fragile and I need to not have more than say 3 concurrent calls, so, the program should be something like:
Feth a page from db
Call the web api with a maximum of three concurrent calls per each record on the page
Save the results back to db
Fetch another page and repeat until there are no more results.
I'm not really getting the sequence that I'm after, basically the db gets all the records regardless of whether they can be processed or not.
I've tried a variety of things including tweaking at the ObserveOn operator, implementing a semaphore, and a few other things. Could I get a little bit of guidance to implement something like this?
using System;
using System.Collections.Generic;
using System.Linq;
using System.Reactive.Concurrency;
using System.Reactive.Linq;
using System.Reactive.Threading.Tasks;
using System.Threading;
using System.Threading.Tasks;
using Castle.Core.Internal;
using Xunit;
using Xunit.Abstractions;
namespace ProductValidation.CLI.Tests.Services
{
public class Example
{
private readonly ITestOutputHelper output;
public Example(ITestOutputHelper output)
{
this.output = output;
}
[Fact]
public async Task RunsObservableToCompletion()
{
var repo = new Repository(output);
var client = new ServiceClient(output);
var results = repo.FetchRecords()
.Select(x => client.FetchMoreInformation(x).ToObservable())
.Merge(1)
.Do(async x => await repo.Save(x));
await results.LastOrDefaultAsync();
}
}
public class Repository
{
private readonly ITestOutputHelper output;
public Repository(ITestOutputHelper output)
{
this.output = output;
}
public IObservable<int> FetchRecords()
{
return Observable.Create<int>(async (observer) =>
{
var page = 1;
var products = await FetchPage(page);
while (!products.IsNullOrEmpty())
{
foreach (var product in products)
{
observer.OnNext(product);
}
page += 1;
products = await FetchPage(page);
}
observer.OnCompleted();
})
.ObserveOn(SynchronizationContext.Current);
}
private async Task<IEnumerable<int>> FetchPage(int page)
{
// Simulate fetching a paged query.
await Task.Delay(500).ToObservable().ObserveOn(new TaskPoolScheduler(new TaskFactory()));
output.WriteLine("Fetching page {0}", page);
if (page >= 4) return Enumerable.Empty<int>();
return Enumerable.Range(1, 3).Select(_ => page);
}
public async Task Save(string id)
{
await Task.Delay(50); //Simulates latency
}
}
public class ServiceClient
{
private readonly ITestOutputHelper output;
private readonly SemaphoreSlim semaphore;
public ServiceClient(ITestOutputHelper output)
{
this.output = output;
this.semaphore = new SemaphoreSlim(2);
}
public async Task<string> FetchMoreInformation(int id)
{
try
{
output.WriteLine("Calling the web client for {0}", id);
await semaphore.WaitAsync(); // Protection for the webapi not sending too many calls
await Task.Delay(1000); //Simulates latency
return id.ToString();
}
finally
{
semaphore.Release();
}
}
}
}

The Rx does not support backpressure, so there is no easy way to fetch the records from the DB at the same tempo that the records are processed. Maybe you could use a Subject<Unit> as a signaling mechanism, push a value every time a record is processed, and devise a way to use these signals at the producing site to fetch a new record from the DB when a signal is received. But it will be a messy and idiomatic solution. The TPL Dataflow is a more suitable tool than the Rx for doing this kind of work. It supports natively the BoundedCapacity configuration option.
Some comments regarding the code you've posted, that are not directly related to the backpressure issue:
The Merge operator with a maxConcurrent parameter imposes a limit on the concurrent subscriptions to the inner sequences, but this will have no effect in case the inner sequences are already up and running. So you have to ensure that the inner sequences are cold, and a handy way to do this is the Defer operator:
.Select(x => Observable.Defer(() =>
client.FetchMoreInformation(x).ToObservable()))
A more common way to convert asynchronous methods to deferred observable sequences is the FromAsync operator:
.Select(x => Observable.FromAsync(() => client.FetchMoreInformation(x)))
Btw the Do operator does not understand async delegates, so instead of:
.Do(async x => await repo.Save(x));
...which creates async void lambdas, it's better to do this:
.Select(x => Observable.FromAsync(() => repo.Save(x)))
.Merge(1);
Update: Here is an example of how you could use a SemaphoreSlim in order to implement backpressure in Rx:
const int boundedCapacity = 10;
using var semaphore = new SemaphoreSlim(boundedCapacity, boundedCapacity);
IObservable<int> results = repo
.FetchRecords(semaphore)
.Select(x => Observable.FromAsync(() => client.FetchMoreInformation(x)))
.Merge(1)
.Select(x => Observable.FromAsync(() => repo.Save(x)))
.Merge(1)
.Do(_ => semaphore.Release());
await results.DefaultIfEmpty();
And inside the FetchRecords method:
//...
await semaphore.WaitAsync();
observer.OnNext(product);
//...
This is a fragile solution, because it depends on propagating all elements through the pipeline. If in the future you decide to include filtering or throttling inside the pipeline, then the one-to-one relationship between WaitAsync and Release will be violated, with the most probable outcome being a deadlocked pipeline.

Related

What is the fastest and most efficient ways to make a large number of C# WebRequests?

I have a list of URLs (thousands), I want to asynchronously get page data from each URL as fast as possible without putting extreme load on the CPU.
I have tried using threading but it still feels quite slow:
public static ConcurrentQueue<string> List = new ConcurrentQueue<string>(); //URL List (assume I added them already)
public static void Threading()
{
for(int i=0;i<100;i++) //100 threads
{
Thread thread = new Thread(new ThreadStart(Task));
thread.Start();
}
}
public static void Task()
{
while(!(List.isEmpty))
{
List.TryDequeue(out string URL);
//GET REQUEST HERE
}
}
Is there any better way to do this? I want to do this asynchronously but I can't figure out how to do it, and I don't want to sacrifice speed or CPU efficiency to do so.
Thanks :)
You should use Microsoft's Reactive Framework (aka Rx) - NuGet System.Reactive and add using System.Reactive.Linq; - then you can do this:
public static IObservable<(string url, string content)> GetAllUrls(List<string> urls) =>
Observable
.Using(
() => new HttpClient(),
hc =>
from url in urls.ToObservable()
from response in Observable.FromAsync(() => hc.GetAsync(url))
from content in Observable.FromAsync(() => response.Content.ReadAsStringAsync())
select (url, content));
That allows you to consume the results in a couple of ways.
You can process them as they get produced:
IDisposable subscription =
GetAllUrls(urlsx).Subscribe(x => Console.WriteLine(x.content));
Or you can get all of them produced and then await the full results:
(string url, string content)[] results = await GetAllUrls(urlsx).ToArray();
You are best off using HttpClient which allows async Task requests.
Just store each task in a list, and await the whole list. To prevent too many requests at once, wait for any single one to complete if there are too many, and remove the completed one from the list.
const int maxDegreeOfParallelism = 100;
static HttpClient _client = new HttpClient();
public static async Task GetAllUrls(List<string> urls)
{
var tasks = new List<Task>(urls.Count);
foreach (var url in urls)
{
if (tasks.Count == maxDegreeOfParallelism) // this prevents too many requests at once
tasks.Remove(await Task.WhenAny(tasks));
tasks.Add(GetUrl(url));
}
await Task.WhenAll(tasks);
}
private static async Task GetUrl(string url)
{
using var response = await _client.GetAsync(url);
// handle response here
var responseStr = await response.Content.ReadAsStringAsync(); // whatever
// do stuff etc
}

C# LanguageExt - combine multiple async calls into one grouped call

I have a method that looks up an item asynchronously from a datastore;
class MyThing {}
Task<Try<MyThing>> GetThing(int thingId) {...}
I want to look up multiple items from the datastore, and wrote a new method to do this. I also wrote a helper method that will take multiple Try<T> and combine their results into a single Try<IEnumerable<T>>.
public static class TryExtensions
{
Try<IEnumerable<T>> Collapse<T>(this IEnumerable<Try<T>> items)
{
var failures = items.Fails().ToArray();
return failures.Any() ?
Try<IEnumerable<T>>(new AggregateException(failures)) :
Try(items.Select(i => i.Succ(a => a).Fail(Enumerable.Empty<T>())));
}
}
async Task<Try<MyThing[]>> GetThings(IEnumerable<string> ids)
{
var results = new List<Try<Things>>();
foreach (var id in ids)
{
var thing = await GetThing(id);
results.Add(thing);
}
return results.Collapse().Map(p => p.ToArray());
}
Another way to do it would be like this;
async Task<Try<MyThing[]>> GetThings(IEnumerable<string> ids)
{
var tasks = ids.Select(async id => await GetThing(id)).ToArray();
await Task.WhenAll(tasks);
return tasks.Select(t => t.Result).Collapse().Map(p => p.ToArray());
}
The problem with this is that all the tasks will run in parallel and I don't want to hammer my datastore with lots of parallel requests. What I really want is to make my code functional, using monadic principles and features of LanguageExt. Does anyone know how to achieve this?
Update
Thanks for the suggestion #MatthewWatson, this is what it looks like with the SemaphoreSlim;
async Task<Try<MyThing[]>> GetThings(IEnumerable<string> ids)
{
var mutex = new SemaphoreSlim(1);
var results = ids.Select(async id =>
{
await mutex.WaitAsync();
try { return await GetThing(id); }
finally { mutex.Release(); }
}).ToArray();
await Task.WhenAll(tasks);
return tasks.Select(t => t.Result).Collapse().Map(Enumerable.ToArray);
return results.Collapse().Map(p => p.ToArray());
}
Problem is, this is still not very monadic / functional, and ends up with more lines of code than my original code with a foreach block.
In the "Another way" you almost achieved your goal when you called:
var tasks = ids.Select(async id => await GetThing(id)).ToArray();
Except that Tasks doesn't run sequentially so you will end up with many queries hitting your datastore, which is caused by .ToArray() and Task.WhenAll. Once you called .ToArray() it allocated and started the Tasks already, so if you can "tolerate" one foreach to achieve the sequential tasks running, like this:
public static class TaskExtensions
{
public static async Task RunSequentially<T>(this IEnumerable<Task<T>> tasks)
{
foreach (var task in tasks) await task;
}
}
Despite that running a "loop" of queries is not a quite good practice
in general, unless you have in some background service and some
special scenario, leveraging this to the Database engine through
WHERE thingId IN (...) in general is a better option. Even you
have big amount of thingIds we can slice it into small 10s, 100s.. to
narrow the WHERE IN footprint.
Back to our RunSequentially, I would like to make it more functional like this for example:
tasks.ToList().ForEach(async task => await task);
But sadly this will still run kinda "Parallel" tasks.
So the final usage should be:
async Task<Try<MyThing[]>> GetThings(IEnumerable<string> ids)
{
var tasks = ids.Select(id => GetThing(id));// remember don't use .ToArray or ToList...
await tasks.RunSequentially();
return tasks.Select(t => t.Result).Collapse().Map(p => p.ToArray());
}
Another overkill functional solution is to get Lazy in a Queue recursively !!
Instead GetThing, get a Lazy one GetLazyThing that returns Lazy<Task<Try<MyThing>>> simply by wrapping GetThing:
new Lazy<Task<Try<MyThing>>>(() => GetThing(id))
Now using couple extensions/functions:
public static async Task RecRunSequentially<T>(this IEnumerable<Lazy<Task<T>>> tasks)
{
var queue = tasks.EnqueueAll();
await RunQueue(queue);
}
public static Queue<T> EnqueueAll<T>(this IEnumerable<T> list)
{
var queue = new Queue<T>();
list.ToList().ForEach(m => queue.Enqueue(m));
return queue;
}
public static async Task RunQueue<T>(Queue<Lazy<Task<T>>> queue)
{
if (queue.Count > 0)
{
var task = queue.Dequeue();
await task.Value; // this unwraps the Lazy object content
await RunQueue(queue);
}
}
Finally:
var lazyTasks = ids.Select(id => GetLazyThing(id));
await lazyTasks.RecRunSequentially();
// Now collapse and map as you like
Update
However if you don't like the fact that EnqueueAll and RunQueue are not "pure", we can take the following approach with the same Lazy trick
public static async Task AwaitSequentially<T>(this Lazy<Task<T>>[] array, int index = 0)
{
if (array == null || index < 0 || index >= array.Length - 1) return;
await array[index].Value;
await AwaitSequentially(array, index + 1); // ++index is not pure :)
}
Now:
var lazyTasks = ids.Select(id => GetLazyThing(id));
await tasks.ToArray().AwaitSequentially();
// Now collapse and map as you like

Unwrapping IObservable<Task<T>> into IObservable<T> with order preservation

Is there a way to unwrap the IObservable<Task<T>> into IObservable<T> keeping the same order of events, like this?
Tasks: ----a-------b--c----------d------e---f---->
Values: -------A-----------B--C------D-----E---F-->
Let's say I have a desktop application that consumes a stream of messages, some of which require heavy post-processing:
IObservable<Message> streamOfMessages = ...;
IObservable<Task<Result>> streamOfTasks = streamOfMessages
.Select(async msg => await PostprocessAsync(msg));
IObservable<Result> streamOfResults = ???; // unwrap streamOfTasks
I imagine two ways of dealing with that.
First, I can subscribe to streamOfTasks using the asynchronous event handler:
streamOfTasks.Subscribe(async task =>
{
var result = await task;
Display(result);
});
Second, I can convert streamOfTasks using Observable.Create, like this:
var streamOfResults =
from task in streamOfTasks
from value in Observable.Create<T>(async (obs, cancel) =>
{
var v = await task;
obs.OnNext(v);
// TODO: don't know when to call obs.OnComplete()
})
select value;
streamOfResults.Subscribe(result => Display(result));
Either way, the order of messages is not preserved: some later messages that
don't need any post-processing come out faster than earlier messages that
require post-processing. Both my solutions handle the incoming messages
in parallel, but I'd like them to be processed sequentially, one by one.
I can write a simple task queue to process just one task at a time,
but perhaps it's an overkill. Seems to me that I'm missing something obvious.
UPD. I wrote a sample console program to demonstrate my approaches. All solutions by far don't preserve the original order of events. Here is the output of the program:
Timer: 0
Timer: 1
Async handler: 1
Observable.Create: 1
Observable.FromAsync: 1
Timer: 2
Async handler: 2
Observable.Create: 2
Observable.FromAsync: 2
Observable.Create: 0
Async handler: 0
Observable.FromAsync: 0
Here is the complete source code:
// "C:\Program Files (x86)\MSBuild\14.0\Bin\csc.exe" test.cs /r:System.Reactive.Core.dll /r:System.Reactive.Linq.dll /r:System.Reactive.Interfaces.dll
using System;
using System.Reactive;
using System.Reactive.Concurrency;
using System.Reactive.Linq;
using System.Threading.Tasks;
class Program
{
static void Main()
{
Console.WriteLine("Press ENTER to exit.");
// the source stream
var timerEvents = Observable.Timer(TimeSpan.Zero, TimeSpan.FromSeconds(1));
timerEvents.Subscribe(x => Console.WriteLine($"Timer: {x}"));
// solution #1: using async event handler
timerEvents.Subscribe(async x =>
{
var result = await PostprocessAsync(x);
Console.WriteLine($"Async handler: {x}");
});
// solution #2: using Observable.Create
var processedEventsV2 =
from task in timerEvents.Select(async x => await PostprocessAsync(x))
from value in Observable.Create<long>(async (obs, cancel) =>
{
var v = await task;
obs.OnNext(v);
})
select value;
processedEventsV2.Subscribe(x => Console.WriteLine($"Observable.Create: {x}"));
// solution #3: using FromAsync, as answered by #Enigmativity
var processedEventsV3 =
from msg in timerEvents
from result in Observable.FromAsync(() => PostprocessAsync(msg))
select result;
processedEventsV3.Subscribe(x => Console.WriteLine($"Observable.FromAsync: {x}"));
Console.ReadLine();
}
static async Task<long> PostprocessAsync(long x)
{
// some messages require long post-processing
if (x % 3 == 0)
{
await Task.Delay(TimeSpan.FromSeconds(2.5));
}
// and some don't
return x;
}
}
Combining #Enigmativity's simple approach with #VMAtm's idea of attaching the counter and some code snippets from this SO question, I came up with this solution:
// usage
var processedStream = timerEvents.SelectAsync(async t => await PostprocessAsync(t));
processedStream.Subscribe(x => Console.WriteLine($"Processed: {x}"));
// my sample console program prints the events ordered properly:
Timer: 0
Timer: 1
Timer: 2
Processed: 0
Processed: 1
Processed: 2
Timer: 3
Timer: 4
Timer: 5
Processed: 3
Processed: 4
Processed: 5
....
Here is my SelectAsync extension method to transform IObservable<Task<TSource>> into IObservable<TResult> keeping the original order of events:
public static IObservable<TResult> SelectAsync<TSource, TResult>(
this IObservable<TSource> src,
Func<TSource, Task<TResult>> selectorAsync)
{
// using local variable for counter is easier than src.Scan(...)
var counter = 0;
var streamOfTasks =
from source in src
from result in Observable.FromAsync(async () => new
{
Index = Interlocked.Increment(ref counter) - 1,
Result = await selectorAsync(source)
})
select result;
// buffer the results coming out of order
return Observable.Create<TResult>(observer =>
{
var index = 0;
var buffer = new Dictionary<int, TResult>();
return streamOfTasks.Subscribe(item =>
{
buffer.Add(item.Index, item.Result);
TResult result;
while (buffer.TryGetValue(index, out result))
{
buffer.Remove(index);
observer.OnNext(result);
index++;
}
});
});
}
I'm not particularly satisfied with my solution as it looks too complex to me, but at least it doesn't require any external dependencies. I'm using here a simple Dictionary to buffer and reorder task results because the subscriber need not to be thread-safe (the subscriptions are neved called concurrently).
Any comments or suggestions are welcome. I'm still hoping to find the native RX way of doing this without custom buffering extension method.
The RX library contains three operators that can unwrap an observable sequence of tasks, the Concat, Merge and Switch. All three accept a single source argument of type IObservable<Task<T>>, and return an IObservable<T>. Here are their descriptions from the documentation:
Concat
Concatenates all task results, as long as the previous task terminated successfully.
Merge
Merges results from all source tasks into a single observable sequence.
Switch
Transforms an observable sequence of tasks into an observable sequence producing values only from the most recent observable sequence. Each time a new task is received, the previous task's result is ignored.
In other words the Concat returns the results in their original order, the Merge returns the results in order of completion, and the Switch filters out any results from tasks that didn't complete before the next task was emitted. So your problem can be solved by just using the built-in Concat operator. No custom operator is needed.
var streamOfResults = streamOfTasks
.Select(async task =>
{
var result1 = await task;
var result2 = await PostprocessAsync(result1);
return result2;
})
.Concat();
The tasks are already started before they are emitted by the streamOfTasks. In other words they are emerging in a "hot" state. So the fact that the Concat operator awaits them the one after the other has no consequence regarding the concurrency of the operations. It only affects the order of their results. This would be a consideration if instead of hot tasks you had cold observables, like these created by the Observable.FromAsync and Observable.Create methods, in which case the Concat would execute the operations sequentially.
Is the following simple approach an answer for you?
IObservable<Result> streamOfResults =
from msg in streamOfMessages
from result in Observable.FromAsync(() => PostprocessAsync(msg))
select result;
To maintain the order of events you can funnel your stream into a TransformBlock from TPL Dataflow. The TransformBlock would execute your post-processing logic and will maintain the order of its output by default.
using System;
using System.Collections.Generic;
using System.Reactive.Linq;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
using NUnit.Framework;
namespace HandlingStreamInOrder {
[TestFixture]
public class ItemHandlerTests {
[Test]
public async Task Items_Are_Output_In_The_Same_Order_As_They_Are_Input() {
var itemHandler = new ItemHandler();
var timerEvents = Observable.Timer(TimeSpan.Zero, TimeSpan.FromMilliseconds(250));
timerEvents.Subscribe(async x => {
var data = (int)x;
Console.WriteLine($"Value Produced: {x}");
var dataAccepted = await itemHandler.SendAsync((int)data);
if (dataAccepted) {
InputItems.Add(data);
}
});
await Task.Delay(5000);
itemHandler.Complete();
await itemHandler.Completion;
CollectionAssert.AreEqual(InputItems, itemHandler.OutputValues);
}
private IList<int> InputItems {
get;
} = new List<int>();
}
public class ItemHandler {
public ItemHandler() {
var options = new ExecutionDataflowBlockOptions() {
BoundedCapacity = DataflowBlockOptions.Unbounded,
MaxDegreeOfParallelism = Environment.ProcessorCount,
EnsureOrdered = true
};
PostProcessBlock = new TransformBlock<int, int>((Func<int, Task<int>>)PostProcess, options);
var output = PostProcessBlock.AsObservable().Subscribe(x => {
Console.WriteLine($"Value Output: {x}");
OutputValues.Add(x);
});
}
public async Task<bool> SendAsync(int data) {
return await PostProcessBlock.SendAsync(data);
}
public void Complete() {
PostProcessBlock.Complete();
}
public Task Completion {
get { return PostProcessBlock.Completion; }
}
public IList<int> OutputValues {
get;
} = new List<int>();
private IPropagatorBlock<int, int> PostProcessBlock {
get;
}
private async Task<int> PostProcess(int data) {
if (data % 3 == 0) {
await Task.Delay(TimeSpan.FromSeconds(2));
}
return data;
}
}
}
Rx and TPL can be easily combined here, and TPL do save the order of events, by default, so your code could be something like this:
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
static async Task<long> PostprocessAsync(long x) { ... }
IObservable<Message> streamOfMessages = ...;
var streamOfTasks = new TransformBlock<long, long>(async msg =>
await PostprocessAsync(msg)
// set the concurrency level for messages to handle
, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = Environment.ProcessorCount });
// easily convert block into observable
IObservable<long> streamOfResults = streamOfTasks.AsObservable();
Edit: Rx extensions meant to be a reactive pipeline of events for UI. As this type of applications are in general single-threaded, so messages are being handled with saving the order. But in general events in C# aren't thread safe, so you have to provide some additional logic to same the order.
If you don't like the idea to introduce another dependency, you need to store the operation number with Interlocked class, something like this:
// counter for operations get started
int operationNumber = 0;
// counter for operations get done
int doneNumber = 0;
...
var currentOperationNumber = Interlocked.Increment(ref operationNumber);
...
while (Interlocked.CompareExchange(ref doneNumber, currentOperationNumber + 1, currentOperationNumber) != currentOperationNumber)
{
// spin once here
}
// handle event
Interlocked.Increment(ref doneNumber);

How to properly execute a List of Tasks async in C#

I have a list of objects that I need to run a long running process on and I would like to kick them off asynchronously, then when they are all finished return them as a list to the calling method. I've been trying different methods that I have found, however it appears that the processes are still running synchronously in the order that they are in the list. So I am sure that I am missing something in the process of how to execute a list of tasks.
Here is my code:
public async Task<List<ShipmentOverview>> GetShipmentByStatus(ShipmentFilterModel filter)
{
if (string.IsNullOrEmpty(filter.Status))
{
throw new InvalidShipmentStatusException(filter.Status);
}
var lookups = GetLookups(false, Brownells.ConsolidatedShipping.Constants.ShipmentStatusType);
var lookup = lookups.SingleOrDefault(sd => sd.Name.ToLower() == filter.Status.ToLower());
if (lookup != null)
{
filter.StatusId = lookup.Id;
var shipments = Shipments.GetShipments(filter);
var tasks = shipments.Select(async model => await GetOverview(model)).ToList();
ShipmentOverview[] finishedTask = await Task.WhenAll(tasks);
return finishedTask.ToList();
}
else
{
throw new InvalidShipmentStatusException(filter.Status);
}
}
private async Task<ShipmentOverview> GetOverview(ShipmentModel model)
{
String version;
var user = AuthContext.GetUserSecurityModel(Identity.Token, out version) as UserSecurityModel;
var profile = AuthContext.GetProfileSecurityModel(user.Profiles.First());
var overview = new ShipmentOverview
{
Id = model.Id,
CanView = true,
CanClose = profile.HasFeatureAction("Shipments", "Close", "POST"),
CanClear = profile.HasFeatureAction("Shipments", "Clear", "POST"),
CanEdit = profile.HasFeatureAction("Shipments", "Get", "PUT"),
ShipmentNumber = model.ShipmentNumber.ToString(),
ShipmentName = model.Name,
};
var parcels = Shipments.GetParcelsInShipment(model.Id);
overview.NumberParcels = parcels.Count;
var orders = parcels.Select(s => WareHouseClient.GetOrderNumberFromParcelId(s.ParcelNumber)).ToList();
overview.NumberOrders = orders.Distinct().Count();
//check validations
var vals = Shipments.GetShipmentValidations(model.Id);
if (model.ValidationTypeId == Constants.OrderValidationType)
{
if (vals.Count > 0)
{
overview.NumberOrdersTotal = vals.Count();
overview.NumberParcelsTotal = vals.Sum(s => WareHouseClient.GetParcelsPerOrder(s.ValidateReference));
}
}
return overview;
}
It looks like you're using asynchronous methods while you really want threads.
Asynchronous methods yield control back to the calling method when an async method is called, then wait until the methods has completed on the await. You can see how it works here.
Basically, the only usefulness of async/await methods is not to lock the UI, so that it stays responsive.
If you want to fire multiple processings in parallel, you will want to use threads, like such:
using System.Threading.Tasks;
public void MainMethod() {
// Parallel.ForEach will automagically run the "right" number of threads in parallel
Parallel.ForEach(shipments, shipment => ProcessShipment(shipment));
// do something when all shipments have been processed
}
public void ProcessShipment(Shipment shipment) { ... }
Marking the method as async doesn't auto-magically make it execute in parallel. Since you're not using await at all, it will in fact execute completely synchronously as if it wasn't async. You might have read somewhere that async makes functions execute asynchronously, but this simply isn't true - forget it. The only thing it does is build a state machine to handle task continuations for you when you use await and actually build all the code to manage those tasks and their error handling.
If your code is mostly I/O bound, use the asynchronous APIs with await to make sure the methods actually execute in parallel. If they are CPU bound, a Task.Run (or Parallel.ForEach) will work best.
Also, there's no point in doing .Select(async model => await GetOverview(model). It's almost equivalent to .Select(model => GetOverview(model). In any case, since the method actually doesn't return an asynchronous task, it will be executed while doing the Select, long before you get to the Task.WhenAll.
Given this, even the GetShipmentByStatus's async is pretty much useless - you only use await to await the Task.WhenAll, but since all the tasks are already completed by that point, it will simply complete synchronously.
If your tasks are CPU bound and not I/O bound, then here is the pattern I believe you're looking for:
static void Main(string[] args) {
Task firstStepTask = Task.Run(() => firstStep());
Task secondStepTask = Task.Run(() => secondStep());
//...
Task finalStepTask = Task.Factory.ContinueWhenAll(
new Task[] { step1Task, step2Task }, //more if more than two steps...
(previousTasks) => finalStep());
finalStepTask.Wait();
}

How do you use AsParallel with the async and await keywords?

I was looking at someone sample code for async and noticed a few issues with the way it was implemented. Whilst looking at the code I wondered if it would be more efficient to loop through a list using as parallel, rather than just looping through the list normally.
As far as I can tell there is very little difference in performance, both use up every processor, and both talk around the same amount of time to completed.
This is the first way of doing it
var tasks= Client.GetClients().Select(async p => await p.Initialize());
And this is the second
var tasks = Client.GetClients().AsParallel().Select(async p => await p.Initialize());
Am I correct in assuming there is no difference between the two?
The full program can be found below
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
RunCode1();
Console.WriteLine("Here");
Console.ReadLine();
RunCode2();
Console.WriteLine("Here");
Console.ReadLine();
}
private async static void RunCode1()
{
Stopwatch myStopWatch = new Stopwatch();
myStopWatch.Start();
var tasks= Client.GetClients().Select(async p => await p.Initialize());
Task.WaitAll(tasks.ToArray());
Console.WriteLine("Time ellapsed(ms): " + myStopWatch.ElapsedMilliseconds);
myStopWatch.Stop();
}
private async static void RunCode2()
{
Stopwatch myStopWatch = new Stopwatch();
myStopWatch.Start();
var tasks = Client.GetClients().AsParallel().Select(async p => await p.Initialize());
Task.WaitAll(tasks.ToArray());
Console.WriteLine("Time ellapsed(ms): " + myStopWatch.ElapsedMilliseconds);
myStopWatch.Stop();
}
}
class Client
{
public static IEnumerable<Client> GetClients()
{
for (int i = 0; i < 100; i++)
{
yield return new Client() { Id = Guid.NewGuid() };
}
}
public Guid Id { get; set; }
//This method has to be called before you use a client
//For the sample, I don't put it on the constructor
public async Task Initialize()
{
await Task.Factory.StartNew(() =>
{
Stopwatch timer = new Stopwatch();
timer.Start();
while(timer.ElapsedMilliseconds<1000)
{}
timer.Stop();
});
Console.WriteLine("Completed: " + Id);
}
}
}
There should be very little discernible difference.
In your first case:
var tasks = Client.GetClients().Select(async p => await p.Initialize());
The executing thread will (one at a time) start executing Initialize for each element in the client list. Initialize immediately queues a method to the thread pool and returns an uncompleted Task.
In your second case:
var tasks = Client.GetClients().AsParallel().Select(async p => await p.Initialize());
The executing thread will fork to the thread pool and (in parallel) start executing Initialize for each element in the client list. Initialize has the same behavior: it immediately queues a method to the thread pool and returns.
The two timings are nearly identical because you're only parallelizing a small amount of code: the queueing of the method to the thread pool and the return of an uncompleted Task.
If Initialize did some longer (synchronous) work before its first await, it may make sense to use AsParallel.
Remember, all async methods (and lambdas) start out being executed synchronously (see the official FAQ or my own intro post).
There's a singular major difference.
In the following code, you are taking it upon yourself to perform the partitioning. In other words, you're creating one Task object per item from the IEnumerable<T> that is returned from the call to GetClients():
var tasks= Client.GetClients().Select(async p => await p.Initialize());
In the second, the call to AsParallel is internally going to use Task instances to execute partitions of the IEnumerable<T> and you're going to have the initial Task that is returned from the lambda async p => await p.Initialize():
var tasks = Client.GetClients().AsParallel().
Select(async p => await p.Initialize());
Finally, you're not really doing anything by using async/await here. Granted, the compiler might optimize this out, but you're just waiting on a method that returns a Task and then returning a continuation that does nothing back through the lambda. That said, since the call to Initialize is already returning a Task, it's best to keep it simple and just do:
var tasks = Client.GetClients().Select(p => p.Initialize());
Which will return the sequence of Task instances for you.
To improve on the above 2 answers this is the simplest way to get an async/threaded execution that is awaitable:
var results = await Task.WhenAll(Client.GetClients()
.Select(async p => p.Initialize()));
This will ensure that it spins separate threads and that you get the results at the end. Hope that helps someone. Took me quite a while to figure this out properly since this is very not obvious and the AsParallel() function seems to be what you want but doesn't use async/await.

Categories