Rx: Wait for several observables to complete - c#

I have list of operations to complete and I want to return an observable which is notified when all the observables are completed (returning status of operations will be the best):
foreach (var id in service.FetchItems().ToEnumerable().ToArray())
{
service.Delete(id); // <- returns IObservable<Unit>
}
// something.Wait();
service.FetchItems() returns IObservable<string>, service.Delete(...) returns IObservable<Unit>
Is the following approach correct?
service.FetchItems().ForEachAsync(id => service.Delete(id)).ToObservable().Wait();

I would avoid all awaiting and tasks and just stick with plain RX for this.
Try this approach:
var query =
from id in service.FetchItems()
from u in service.Delete(id)
select id;
query
.ToArray()
.Subscribe(ids =>
{
/* all fetches and deletes done now */
});
The .ToArray() operator in Rx takes an IObservable<T> that returns zero or more T's and returns an IObservable<T[]> that returns a single array that contains zero or more T's only when the source observable completes.

ToEnumerable blocks waiting for the next element in the sequence. You could do:
Task delAllTask = service.FetchItems()
.SelectMany(service.Delete)
.ToTask();
then you can block on the task or continue asynchronously e.g.
delAllTask.Wait();
delAllTask.ContinueWith(...);

Related

Offloading long blocking calls to separate threads using System.Observable

I want to execute a blocking method into separate threads. To illustrate the situation, I created this example:
int LongBlockingCall(int n)
{
Thread.Sleep(1000);
return n + 1;
}
var observable = Observable
.Range(0, 10)
.SelectMany(n =>
Observable.Defer(() => Observable.Return(LongBlockingCall(n), NewThreadScheduler.Default)));
var results = await observable.ToList();
I expected this to run in ~1 second, but it takes ~10 seconds instead.
A new thread should be spawned for each call, because I'm specifying NewThreadScheduler.Default, right?
What am I doing wrong?
You have to replace this line:
Observable.Return(LongBlockingCall(n), NewThreadScheduler.Default)
...with this:
Observable.Start(() => LongBlockingCall(n), NewThreadScheduler.Default)
The Observable.Return just returns a value. It's not aware how this value is produced. In order to invoke an action on a specific scheduler, you need the Observable.Start method.

Is there a way to combine LINQ and async

Basically I have a procedure like
var results = await Task.WhenAll(
from input in inputs
select Task.Run(async () => await InnerMethodAsync(input))
);
.
.
.
private static async Task<Output> InnerMethodAsync(Input input)
{
var x = await Foo(input);
var y = await Bar(x);
var z = await Baz(y);
return z;
}
and I'm wondering whether there's a fancy way to combine this into a single LINQ query that's like an "async stream" (best way I can describe it).
When you use LINQ, there are generally two parts to it: creation and iteration.
Creation:
var query = list.Select( a => a.Name);
These calls are always synchronous. But this code doesn't do much more than create an object that exposes an IEnumerable. The actual work isn't done till later, due to a pattern called deferred execution.
Iteration:
var results = query.ToList();
This code takes the enumerable and gets the value of each item, which typically will involve the invocation of your callback delegates (in this case, a => a.Name ). This is the part that is potentially expensive, and could benefit from asychronousness, e.g. if your callback is something like async a => await httpClient.GetByteArrayAsync(a).
So it's the iteration part that we're interested in, if we want to make it async.
The issue here is that ToList() (and most of the other methods that force iteration, like Any() or Last()) are not asynchronous methods, so your callback delegate will be invoked synchronously, and you’ll end up with a list of tasks instead of the data you want.
We can get around that with a piece of code like this:
public static class ExtensionMethods
{
static public async Task<List<T>> ToListAsync<T>(this IEnumerable<Task<T>> This)
{
var tasks = This.ToList(); //Force LINQ to iterate and create all the tasks. Tasks always start when created.
var results = new List<T>(); //Create a list to hold the results (not the tasks)
foreach (var item in tasks)
{
results.Add(await item); //Await the result for each task and add to results list
}
return results;
}
}
With this extension method, we can rewrite your code:
var results = await inputs.Select( async i => await InnerMethodAsync(i) ).ToListAsync();
^That should give you the async behavior you're looking for, and avoids creating thread pool tasks, as your example does.
Note: If you are using LINQ-to-entities, the expensive part (the data retrieval) isn't exposed to you. For LINQ-to-entities, you'd want to use the ToListAsync() that comes with the EF framework instead.
Try it out and see the timings in my demo on DotNetFiddle.
A rather obvious answer, but you have just used LINQ and async together - you're using LINQ's select to project, and start, a bunch of async Tasks, and then await on the results, which provides an asynchronous parallelism pattern.
Although you've likely just provided a sample, there are a couple of things to note in your code (I've switched to Lambda syntax, but the same principals apply)
Since there's basically zero CPU bound work on each Task before the first await (i.e. no work done before var x = await Foo(input);), there's no real reason to use Task.Run here.
And since there's no work to be done in the lambda after call to InnerMethodAsync, you don't need to wrap the InnerMethodAsync calls in an async lambda (but be wary of IDisposable)
i.e. You can just select the Task returned from InnerMethodAsync and await these with Task.WhenAll.
var tasks = inputs
.Select(input => InnerMethodAsync(input)) // or just .Select(InnerMethodAsync);
var results = await Task.WhenAll(tasks);
More complex patterns are possible with asynchronony and Linq, but rather than reinventing the wheel, you should have a look at Reactive Extensions, and the TPL Data Flow Library, which have many building blocks for complex flows.
Try using Microsoft's Reactive Framework. Then you can do this:
IObservable<Output[]> query =
from input in inputs.ToObservable()
from x in Observable.FromAsync(() => Foo(input))
from y in Observable.FromAsync(() => Bar(x))
from z in Observable.FromAsync(() => Baz(y))
select z;
Output[] results = await query.ToArray();
Simple.
Just NuGet "System.Reactive" and add using System.Reactive.Linq; to your code.

Async Await in Lambda expression where clause

I would want to call an async method inside lambda expression. Please help me doing the below
eg -
return xyz.Where(async x=> await AsyncMethodCall(x.val));
And the Async method looks like
public async Task<bool> AsyncMethodCall(Data d){...}
When I do the above, I get the following error
Error CS4010 Cannot convert async lambda expression to delegate type
'Func<Data, bool>'. An async lambda expression may return void, Task
or Task<T>, none of which are convertible to 'Func<Data, bool>'.
Thanks in advance for the help
Asynchronous sequences are tricky because you really have to think about what specifically you want the code to do.
For example, you could want to execute the AsyncMethodCall calls sequentially, and then return all the results:
var result = new List<T>();
foreach (var d in xyz)
if (await AsyncMethodCall(d.val))
result.Add(d);
return result;
Or, you could execute all the AsyncMethodCall calls concurrently, and then collect and return the results (again, all at once):
var tasks = xyz.Select(async d => new { d, filter = await AsyncMethodCall(d.val) });
var results = await Task.WhenAll(tasks);
return results.Where(x => x.filter).Select(x => x.d);
Or, you could execute all the AsyncMethodCall calls sequentially, and produce the results one at a time. This approach is incompatible with IEnumerable<T> (assuming you want to keep the call asynchronous). If you want to produce a sequence where AsyncMethodCall is asynchronously invoked during sequence enumeration, then you would need to change to IAsyncEnumerable<T>. If you want to produce a sequence that is started by the consumer and then produces results on its own, you would need to change to IObservable<T>.
Or, you could execute all the AsyncMethodCall calls concurrently, and produce the results one at a time. This is also incompatible with IEnumerable<T>; you would need to change to IObservable<T>. And you would also need to decide whether to maintain the original ordering, or to produce them in order of AsyncMethodCall completing.

Counting Non-Faulted Tasks causes re-execution of each task

I am saving a bunch of items to my database using async saves
var tasks = items.Select(item =>
{
var clone = item.MakeCopy();
clone.Id = Guid.NewGuid();
return dbAccess.SaveAsync(clone);
});
await Task.WhenAll(tasks);
I need to verify how many times SaveAsync was successful (It throws and exception if something goes wrong). I am using IsFaulted flag to examine the tasks:
var successCount = tasks.Count(t => !t.IsFaulted);
Collection items consists of 3 elements so SaveAsync should have been called three times but it is executed 6 times. Upon closer examination I noticed that counting non-faulted tasks with c.Count(...) causes each of the task to re-run.
I suspect it has something to do with deferred LINQ execution but I am not sure why exactly and how to fix this.
Any suggestion why I observe this behavior and what would be the optimal pattern to avoid this artifact?
It happens because of multiple enumeration of your Select query.
In order to fix it, force enumeration by calling ToList() method. Then it will work correctly.
var tasks = items.Select(item =>
{
var clone = item.MakeCopy();
clone.Id = Guid.NewGuid();
return dbAccess.SaveAsync(clone);
})
.ToList();
Also you may take a look at these more detailed answers:
https://stackoverflow.com/a/8240935/3872935
https://stackoverflow.com/a/20129161/3872935.

Parallel Linq - return first result that comes back

I'm using PLINQ to run a function that tests serial ports to determine if they're a GPS device.
Some serial ports immediately are found to be a valid GPS. In this case, I want the first one to complete the test to be the one returned. I don't want to wait for the rest of the results.
Can I do this with PLINQ, or do I have to schedule a batch of tasks and wait for one to return?
PLINQ is probably not going to suffice here. While you can use .First, in .NET 4, this will cause it to run sequentially, which defeats the purpose. (Note that this will be improved in .NET 4.5.)
The TPL, however, is most likely the right answer here. You can create a Task<Location> for each serial port, and then use Task.WaitAny to wait on the first successful operation.
This provides a simple way to schedule a bunch of "tasks" and then just use the first result.
I have been thinking about this on and off for the past couple days and I can't find a built in PLINQ way to do this in C# 4.0. The accepted answer to this question of using FirstOrDefault does not return a value until the full PLINQ query is complete and still returns the (ordered) first result. The following extreme example shows the behavior:
var cts = new CancellationTokenSource();
var rnd = new ThreadLocal<Random>(() => new Random());
var q = Enumerable.Range(0, 11).Select(x => x).AsParallel()
.WithCancellation(cts.Token).WithMergeOptions( ParallelMergeOptions.NotBuffered).WithDegreeOfParallelism(10).AsUnordered()
.Where(i => i % 2 == 0 )
.Select( i =>
{
if( i == 0 )
Thread.Sleep(3000);
else
Thread.Sleep(rnd.Value.Next(50, 100));
return string.Format("dat {0}", i).Dump();
});
cts.CancelAfter(5000);
// waits until all results are in, then returns first
q.FirstOrDefault().Dump("result");
I don't see a built-in way to immediately get the first available result, but I was able to come up with two workarounds.
The first creates Tasks to do the work and returns the Task, resulting in a quickly completed PLINQ query. The resulting tasks can be passed to WaitAny to get the first result as soon as it is available:
var cts = new CancellationTokenSource();
var rnd = new ThreadLocal<Random>(() => new Random());
var q = Enumerable.Range(0, 11).Select(x => x).AsParallel()
.WithCancellation(cts.Token).WithMergeOptions( ParallelMergeOptions.NotBuffered).WithDegreeOfParallelism(10).AsUnordered()
.Where(i => i % 2 == 0 )
.Select( i =>
{
return Task.Factory.StartNew(() =>
{
if( i == 0 )
Thread.Sleep(3000);
else
Thread.Sleep(rnd.Value.Next(50, 100));
return string.Format("dat {0}", i).Dump();
});
});
cts.CancelAfter(5000);
// returns as soon as the tasks are created
var ts = q.ToArray();
// wait till the first task finishes
var idx = Task.WaitAny( ts );
ts[idx].Result.Dump("res");
This is probably a terrible way to do it. Since the actual work of the PLINQ query is just a very fast Task.Factory.StartNew, it's pointless to use PLINQ at all. A simple .Select( i => Task.Factory.StartNew( ... on the IEnumerable is cleaner and probably faster.
The second workaround uses a queue (BlockingCollection) and just inserts results into this queue once they are computed:
var cts = new CancellationTokenSource();
var rnd = new ThreadLocal<Random>(() => new Random());
var q = Enumerable.Range(0, 11).Select(x => x).AsParallel()
.WithCancellation(cts.Token).WithMergeOptions( ParallelMergeOptions.NotBuffered).WithDegreeOfParallelism(10).AsUnordered()
.Where(i => i % 2 == 0 )
.Select( i =>
{
if( i == 0 )
Thread.Sleep(3000);
else
Thread.Sleep(rnd.Value.Next(50, 100));
return string.Format("dat {0}", i).Dump();
});
cts.CancelAfter(5000);
var qu = new BlockingCollection<string>();
// ForAll blocks until PLINQ query is complete
Task.Factory.StartNew(() => q.ForAll( x => qu.Add(x) ));
// get first result asap
qu.Take().Dump("result");
With this method, the work is done using PLINQ, and the BlockingCollecion's Take() will return the first result as soon as it is inserted by the PLINQ query.
While this produces the desired result, I am not sure it has any advantage over just using the simpler Tasks + WaitAny
Upon further review, you can apparently just use FirstOrDefault to solve this. PLINQ will not preserve ordering by default, and with an unbuffered query, will return immediately.
http://msdn.microsoft.com/en-us/library/dd460677.aspx
To accomplish this entirely with PLINQ in .NET 4.0:
SerialPorts. // Your IEnumerable of serial ports
AsParallel().AsUnordered(). // Run as an unordered parallel query
Where(IsGps). // Matching the predicate IsGps (Func<SerialPort, bool>)
Take(1). // Taking the first match
FirstOrDefault(); // And unwrap it from the IEnumerable (or null if none are found
The key is to not use an ordered evaluation like First or FirstOrDefault until you have specified that you only care to find one.

Categories