I am saving a bunch of items to my database using async saves
var tasks = items.Select(item =>
{
var clone = item.MakeCopy();
clone.Id = Guid.NewGuid();
return dbAccess.SaveAsync(clone);
});
await Task.WhenAll(tasks);
I need to verify how many times SaveAsync was successful (It throws and exception if something goes wrong). I am using IsFaulted flag to examine the tasks:
var successCount = tasks.Count(t => !t.IsFaulted);
Collection items consists of 3 elements so SaveAsync should have been called three times but it is executed 6 times. Upon closer examination I noticed that counting non-faulted tasks with c.Count(...) causes each of the task to re-run.
I suspect it has something to do with deferred LINQ execution but I am not sure why exactly and how to fix this.
Any suggestion why I observe this behavior and what would be the optimal pattern to avoid this artifact?
It happens because of multiple enumeration of your Select query.
In order to fix it, force enumeration by calling ToList() method. Then it will work correctly.
var tasks = items.Select(item =>
{
var clone = item.MakeCopy();
clone.Id = Guid.NewGuid();
return dbAccess.SaveAsync(clone);
})
.ToList();
Also you may take a look at these more detailed answers:
https://stackoverflow.com/a/8240935/3872935
https://stackoverflow.com/a/20129161/3872935.
Related
I need to bulk load two groups of data to Neo4j. The second group builds on the nodes and relationships created by the first group, so the second group cannot be loaded until the first group is fully committed.
My current implementation reads like the following.
public override void Import()
{
using var session = _driver.AsyncSession(
x => x.WithDefaultAccessMode(AccessMode.Write));
var groupA = session.WriteTransactionAsync(async x =>
{
var result = await x.RunAsync(queryA);
return result.ToListAsync();
});
groupA.Result.Wait();
var groupB = session.WriteTransactionAsync(async x =>
{
var result = await x.RunAsync(queryB);
return result.ToListAsync();
});
groupB.Result.Wait();
}
Note that I cannot change the method to be async; hence I am not using await.
This method sometimes runs correctly, but sometimes I get an error that reads like the following.
Cannot access records on this result any more as the result has already been consumed or the query runner where the result is created has already been closed.
I can switch to using the following instead:
groupA.Wait();
Though the issue with this is that the Status of groupA can be RanToCompletion while the Status of the Result is Faulted (e.g., when there is a syntax error with the query). Hence, any possible errors during the execution of queryA are not captured, and I guess that requires checking groupA.Result.Exceptions.
Given these challenges, I am not sure what is the recommended way of waiting for dependent Neo4j tasks in a synchronous method.
What you need is to await the result.ToListAsync in the scope of the Func. Your transaction is completing before you read the results, and after the transaction is closed you can no longer access the results.
var groupA = session.WriteTransactionAsync(async x =>
{
var result = await x.RunAsync(queryA);
return await result.ToListAsync();
});
groupA.Wait();
var groupB = session.WriteTransactionAsync(async x =>
{
var result = await x.RunAsync(queryB);
return await result.ToListAsync();
});
groupB.Wait();
The reason it sometimes completes successfully is because the result.ToListAsync task completes before your transaction has been commited & closed.
How can I set a Task.WhenAll() result to a value within a Task.WhenAll() routine? For example, the following code will get all authorized users for all domain/group combos in appRoleMaps:
var results = await Task.WhenAll(appRoleMaps.Select(
x => GetAuthorizedUsers(x.DomainName, x.ADGroupName)));
But what if I want to set the authorized users results of each iteration as the iteration item Authorized property value? For example, something like the following code (although the following code does not work):
var results = await Task.WhenAll(appRoleMaps.Select(
x => x.AuthorizedUsers = GetAuthorizedUsers(x.DomainName, x.ADGroupName)));
Is there a streamlined way to do this? Reason being, the GetAuthorizedUsers result set does not include domain/group info, so I can't just do a simple foreach/where at the end to easily join by this info.
You could create an async lambda to pass into your Select. This would cause each result to be assigned to the AuthorizedUsers property on the associated instance. The outer Task.WhenAll is only required to know when all elements have been processed.
await Task.WhenAll(appRoleMaps.Select(async x =>
x.AuthorizedUsers = await GetAuthorizedUsers(x.DomainName, x.ADGroupName)));
Save the tasks and query them after waiting:
var tasks = appRoleMaps.Select(x => GetAuthorizedUsers(x.DomainName, x.ADGroupName)).ToList();
await Task.WhenAll(tasks);
var results = tasks.Select(t => t.Result).ToList();
This is cleaner than relying on side effects. Squirreling the value away into some property and later extracting it obfuscates the meaning of the code and is more work to code.
Basically I have a procedure like
var results = await Task.WhenAll(
from input in inputs
select Task.Run(async () => await InnerMethodAsync(input))
);
.
.
.
private static async Task<Output> InnerMethodAsync(Input input)
{
var x = await Foo(input);
var y = await Bar(x);
var z = await Baz(y);
return z;
}
and I'm wondering whether there's a fancy way to combine this into a single LINQ query that's like an "async stream" (best way I can describe it).
When you use LINQ, there are generally two parts to it: creation and iteration.
Creation:
var query = list.Select( a => a.Name);
These calls are always synchronous. But this code doesn't do much more than create an object that exposes an IEnumerable. The actual work isn't done till later, due to a pattern called deferred execution.
Iteration:
var results = query.ToList();
This code takes the enumerable and gets the value of each item, which typically will involve the invocation of your callback delegates (in this case, a => a.Name ). This is the part that is potentially expensive, and could benefit from asychronousness, e.g. if your callback is something like async a => await httpClient.GetByteArrayAsync(a).
So it's the iteration part that we're interested in, if we want to make it async.
The issue here is that ToList() (and most of the other methods that force iteration, like Any() or Last()) are not asynchronous methods, so your callback delegate will be invoked synchronously, and you’ll end up with a list of tasks instead of the data you want.
We can get around that with a piece of code like this:
public static class ExtensionMethods
{
static public async Task<List<T>> ToListAsync<T>(this IEnumerable<Task<T>> This)
{
var tasks = This.ToList(); //Force LINQ to iterate and create all the tasks. Tasks always start when created.
var results = new List<T>(); //Create a list to hold the results (not the tasks)
foreach (var item in tasks)
{
results.Add(await item); //Await the result for each task and add to results list
}
return results;
}
}
With this extension method, we can rewrite your code:
var results = await inputs.Select( async i => await InnerMethodAsync(i) ).ToListAsync();
^That should give you the async behavior you're looking for, and avoids creating thread pool tasks, as your example does.
Note: If you are using LINQ-to-entities, the expensive part (the data retrieval) isn't exposed to you. For LINQ-to-entities, you'd want to use the ToListAsync() that comes with the EF framework instead.
Try it out and see the timings in my demo on DotNetFiddle.
A rather obvious answer, but you have just used LINQ and async together - you're using LINQ's select to project, and start, a bunch of async Tasks, and then await on the results, which provides an asynchronous parallelism pattern.
Although you've likely just provided a sample, there are a couple of things to note in your code (I've switched to Lambda syntax, but the same principals apply)
Since there's basically zero CPU bound work on each Task before the first await (i.e. no work done before var x = await Foo(input);), there's no real reason to use Task.Run here.
And since there's no work to be done in the lambda after call to InnerMethodAsync, you don't need to wrap the InnerMethodAsync calls in an async lambda (but be wary of IDisposable)
i.e. You can just select the Task returned from InnerMethodAsync and await these with Task.WhenAll.
var tasks = inputs
.Select(input => InnerMethodAsync(input)) // or just .Select(InnerMethodAsync);
var results = await Task.WhenAll(tasks);
More complex patterns are possible with asynchronony and Linq, but rather than reinventing the wheel, you should have a look at Reactive Extensions, and the TPL Data Flow Library, which have many building blocks for complex flows.
Try using Microsoft's Reactive Framework. Then you can do this:
IObservable<Output[]> query =
from input in inputs.ToObservable()
from x in Observable.FromAsync(() => Foo(input))
from y in Observable.FromAsync(() => Bar(x))
from z in Observable.FromAsync(() => Baz(y))
select z;
Output[] results = await query.ToArray();
Simple.
Just NuGet "System.Reactive" and add using System.Reactive.Linq; to your code.
This question already has answers here:
Async await in linq select
(8 answers)
Closed 5 months ago.
After reading this post:
Nesting await in Parallel.ForEach
I tried to do the following:
private static async void Solution3UsingLinq()
{
var ids = new List<string>() { "1", "2", "3", "4", "5", "6", "7", "8", "9", "10" };
var customerTasks = ids.Select(async i =>
{
ICustomerRepo repo = new CustomerRepo();
var id = await repo.getCustomer(i);
Console.WriteLine(id);
});
}
For some reason, this doesn't work... I don't understand why, I think there is a deadlock but i'm not sure...
So at the end of your method, customerTasks contains an IEnumerable<Task> that has not been enumerated. None of the code within the Select even runs.
When creating tasks like this, it's probably safer to materialize your sequence immediately to mitigate the risk of double enumeration (and creating a second batch of tasks by accident). You can do this by calling ToList on your sequence.
So:
var customerTasks = ids.Select(async i =>
{
ICustomerRepo repo = new CustomerRepo();
var id = await repo.getCustomer(i); //consider changing to GetCustomerAsync
Console.WriteLine(id);
}).ToList();
Now... what to do with your list of tasks? You need to wait for them all to complete...
You can do this with Task.WhenAll:
await Task.WhenAll(customerTasks);
You could take this a step further by actually returning a value from your async delegate in the Select statement, so you end up with an IEnumerable<Task<Customer>>.
Then you can use a different overload of Task.WhenAll:
IEnumerable<Task<Customer>> customerTasks = ids.Select(async i =>
{
ICustomerRepo repo = new CustomerRepo();
var c = await repo.getCustomer(i); //consider changing to GetCustomerAsync
return c;
}).ToList();
Customer[] customers = await Task.WhenAll(customerTasks); //look... all the customers
Of course, there are probably more efficient means of getting several customers in one go, but that would be for a different question.
If instead, you'd like to perform your async tasks in sequence then:
var customerTasks = ids.Select(async i =>
{
ICustomerRepo repo = new CustomerRepo();
var id = await repo.getCustomer(i); //consider changing to GetCustomerAsync
Console.WriteLine(id);
});
foreach(var task in customerTasks) //items in sequence will be materialized one-by-one
{
await task;
}
Addition:
There seems to be some confusion about when the the LINQ
statements actually are executed, especially the Where statement.
I created a small program to show when the
source data actually is accessed
Results at the end of this answer
end of Addition
You have to be aware about the lazyness of most LINQ functions.
Lazy LINQ functions will only change the Enumerator that IEnumerable.GetEnumerator() will return when you start enumerating. Hence, as long as you call lazy LINQ functions the query isn't executed.
Only when you starte enumerating, the query is executed. Enumerating starts when you call foreach, or non-layzy LINQ functions like ToList(), Any(), FirstOrDefault(), Max(), etc.
In the comments section of every LINQ function is described whether the function is lazy or not. You can also see whether the function is lazy by inspecting the return value. If it returns an IEnumerable<...> (or IQueryable) the LINQ is not enumerated yet.
The nice thing about this lazyness, is that as long as you use only lazy functions, changing the LINQ expression is not time consuming. Only when you use non-lazy functions, you have to be aware of the impact.
For instance, if fetching the first element of a sequence takes a long time to calculate, because of Ordering, Grouping, Database queries etc, make sure you don't start enumerating more then once (= don't use non-lazy functions for the same sequence more than once)
Don't do this at home:
Suppose you have the following query
var query = toDoLists
.Where(todo => todo.Person == me)
.GroupBy(todo => todo.Priority)
.Select(todoGroup => new
{
Priority = todoGroup.Key,
Hours = todoGroup.Select(todo => todo.ExpectedWorkTime).Sum(),
}
.OrderByDescending(work => work.Priority)
.ThenBy(work => work.WorkCount);
This query contains only lazy LINQ functions. After all these statement, the todoLists have not been accessed yet.
But as soon as you get the first element of the resulting sequence all elements have to be accessed (probably more than once) to group them by priority, calculate the total number of involved working hours and to sort them by descending priority.
This is the case for Any(), and again for First():
if (query.Any()) // do grouping, summing, ordering
{
var highestOnTodoList = query.First(); // do all work again
Process(highestOnTodoList);
}
else
{ // nothing to do
GoFishing();
}
In such cases it is better to use the correct function:
var highestOnToDoList = query.FirstOrDefault(); // do grouping / summing/ ordering
if (highestOnTioDoList != null)
etc.
back to your question
The Enumerable.Select statement only created an IEnumerable object for you. You forgot to enumerate over it.
Besides you constructed your CustomerRepo several times. Was that intended?
ICustomerRepo repo = new CustomerRepo();
IEnumerable<Task<CustomerRepo>> query = ids.Select(id => repo.getCustomer(i));
foreach (var task in query)
{
id = await task;
Console.WriteLine(id);
}
Addition: when are the LINQ statements executed?
I created a small program to test when a LINQ statement is executed, especially when a Where is executed.
A function that returns an IEnumerable:
IEnumerable<int> GetNumbers()
{
for (int i=0; i<10; ++i)
{
yield return i;
}
}
A program that uses this enumeration using an old fashioned Enumerator
public static void Main()
{
IEnumerable<int> number = GetNumbers();
IEnumerable<int> smallNumbers = numbers.Where(number => number < 3);
IEnumerator<int> smallEnumerator = smallNumbers.GetEnumerator();
bool smallNumberAvailable = smallEnumerator.MoveNext();
while (smallNumberAvailable)
{
int smallNumber = smallEnumerator.Current;
Console.WriteLine(smallNumber);
smallNumberAvailable = smallEnumerator.MoveNext();
}
}
During debugging I can see that the first time GetNumbers is executed the first time that MoveNext() is called. GetNumbers() is executed until the first yield return statement.
Every time that MoveNext() is called the statements after the yield return are performed until the next yield return is executed.
Changing the code such that the enumerator is accessed using foreach, Any(), FirstOrDefault(), ToDictionary, etc, shows that the calls to these functions are the time that the originating source is actually accessed.
if (smallNumbers.Any())
{
int x = smallNumbers.First();
Console.WriteLine(x);
}
Debugging shows that the originating source starts enumerating from the beginning twice. So indeed, it is not wise to do this, especially if you need to do a lot to calculate the first element (GroupBy, OrderBy, Database access, etc)
I have the below method:
private static List<List<job>> SplitJobsByMonth(IEnumerable<job> inactiveJobs)
{
List<List<job>> jobsByMonth = new List<List<job>>();
DateTime cutOff = DateTime.Now.Date.AddMonths(-1).Date;
cutOff = cutOff.AddDays(-cutOff.Day + 1);
List<job> temp;
while (inactiveJobs.Count() > 0)
{
temp = inactiveJobs.Where(j => j.completeddt >= cutOff).ToList();
jobsByMonth.Add(temp);
inactiveJobs = inactiveJobs.Where(a => !temp.Contains(a));
cutOff = cutOff.AddMonths(-1);
}
return jobsByMonth;
}
It aims to split the jobs by month. 'job' is a class, not a struct. In the while loop, the passed in IEnumerable is reset with each iteration to remove the jobs that have been processed:
inactiveJobs = inactiveJobs.Where(a => !temp.Contains(a));
Typically this reduces the content of this collection by quite a lot. However, on the next iteration the line:
temp = inactiveJobs.Where(j => j.completeddt >= cutOff).ToList();
restores the inactiveJobs object to the state it was when it was passed into the method - so the collection is full again.
I have solved this problem by refactoring this method slightly, but I am curious as to why this issue occurs as I can't explain it. Can anyone explain why this is happening?
Why not just use a group by?
private static List<List<job>> SplitJobsByMonth(IEnumerable<job> inactiveJobs)
{
var jobsByMonth = (from job in inactiveJobs
group job by new DateTime(job.completeddt.Year, job.completeddt.Month, 1)
into g
select g.ToList()).ToList();
return jobsByMonth;
}
This happens because of deferred execution of LINQ's Where.
When you do this
inactiveJobs = inactiveJobs.Where(a => !temp.Contains(a));
no evaluation is actually happening until you start iterating the IEnumerable. If you add ToList after Where, the iteration would happen right away, so the content of interactiveJobs would be reduced:
inactiveJobs = inactiveJobs.Where(a => !temp.Contains(a)).ToList();
In LINQ, queries have two different behaviors of execution: immediate and deferred.
The query is actually executed when the query variable is iterated over, not when the query variable is created. This is called deferred execution.
You can also force a query to execute immediately, which is useful for caching query results.
In order to make this add .ToList() in the end of your line:
inactiveJobs = inactiveJobs.Where(a => !temp.Contains(a)).ToList();
This executes the created query immediately and writes result to your variable.
You can see more about this example Here.