This question already has answers here:
Async await in linq select
(8 answers)
Closed 5 months ago.
After reading this post:
Nesting await in Parallel.ForEach
I tried to do the following:
private static async void Solution3UsingLinq()
{
var ids = new List<string>() { "1", "2", "3", "4", "5", "6", "7", "8", "9", "10" };
var customerTasks = ids.Select(async i =>
{
ICustomerRepo repo = new CustomerRepo();
var id = await repo.getCustomer(i);
Console.WriteLine(id);
});
}
For some reason, this doesn't work... I don't understand why, I think there is a deadlock but i'm not sure...
So at the end of your method, customerTasks contains an IEnumerable<Task> that has not been enumerated. None of the code within the Select even runs.
When creating tasks like this, it's probably safer to materialize your sequence immediately to mitigate the risk of double enumeration (and creating a second batch of tasks by accident). You can do this by calling ToList on your sequence.
So:
var customerTasks = ids.Select(async i =>
{
ICustomerRepo repo = new CustomerRepo();
var id = await repo.getCustomer(i); //consider changing to GetCustomerAsync
Console.WriteLine(id);
}).ToList();
Now... what to do with your list of tasks? You need to wait for them all to complete...
You can do this with Task.WhenAll:
await Task.WhenAll(customerTasks);
You could take this a step further by actually returning a value from your async delegate in the Select statement, so you end up with an IEnumerable<Task<Customer>>.
Then you can use a different overload of Task.WhenAll:
IEnumerable<Task<Customer>> customerTasks = ids.Select(async i =>
{
ICustomerRepo repo = new CustomerRepo();
var c = await repo.getCustomer(i); //consider changing to GetCustomerAsync
return c;
}).ToList();
Customer[] customers = await Task.WhenAll(customerTasks); //look... all the customers
Of course, there are probably more efficient means of getting several customers in one go, but that would be for a different question.
If instead, you'd like to perform your async tasks in sequence then:
var customerTasks = ids.Select(async i =>
{
ICustomerRepo repo = new CustomerRepo();
var id = await repo.getCustomer(i); //consider changing to GetCustomerAsync
Console.WriteLine(id);
});
foreach(var task in customerTasks) //items in sequence will be materialized one-by-one
{
await task;
}
Addition:
There seems to be some confusion about when the the LINQ
statements actually are executed, especially the Where statement.
I created a small program to show when the
source data actually is accessed
Results at the end of this answer
end of Addition
You have to be aware about the lazyness of most LINQ functions.
Lazy LINQ functions will only change the Enumerator that IEnumerable.GetEnumerator() will return when you start enumerating. Hence, as long as you call lazy LINQ functions the query isn't executed.
Only when you starte enumerating, the query is executed. Enumerating starts when you call foreach, or non-layzy LINQ functions like ToList(), Any(), FirstOrDefault(), Max(), etc.
In the comments section of every LINQ function is described whether the function is lazy or not. You can also see whether the function is lazy by inspecting the return value. If it returns an IEnumerable<...> (or IQueryable) the LINQ is not enumerated yet.
The nice thing about this lazyness, is that as long as you use only lazy functions, changing the LINQ expression is not time consuming. Only when you use non-lazy functions, you have to be aware of the impact.
For instance, if fetching the first element of a sequence takes a long time to calculate, because of Ordering, Grouping, Database queries etc, make sure you don't start enumerating more then once (= don't use non-lazy functions for the same sequence more than once)
Don't do this at home:
Suppose you have the following query
var query = toDoLists
.Where(todo => todo.Person == me)
.GroupBy(todo => todo.Priority)
.Select(todoGroup => new
{
Priority = todoGroup.Key,
Hours = todoGroup.Select(todo => todo.ExpectedWorkTime).Sum(),
}
.OrderByDescending(work => work.Priority)
.ThenBy(work => work.WorkCount);
This query contains only lazy LINQ functions. After all these statement, the todoLists have not been accessed yet.
But as soon as you get the first element of the resulting sequence all elements have to be accessed (probably more than once) to group them by priority, calculate the total number of involved working hours and to sort them by descending priority.
This is the case for Any(), and again for First():
if (query.Any()) // do grouping, summing, ordering
{
var highestOnTodoList = query.First(); // do all work again
Process(highestOnTodoList);
}
else
{ // nothing to do
GoFishing();
}
In such cases it is better to use the correct function:
var highestOnToDoList = query.FirstOrDefault(); // do grouping / summing/ ordering
if (highestOnTioDoList != null)
etc.
back to your question
The Enumerable.Select statement only created an IEnumerable object for you. You forgot to enumerate over it.
Besides you constructed your CustomerRepo several times. Was that intended?
ICustomerRepo repo = new CustomerRepo();
IEnumerable<Task<CustomerRepo>> query = ids.Select(id => repo.getCustomer(i));
foreach (var task in query)
{
id = await task;
Console.WriteLine(id);
}
Addition: when are the LINQ statements executed?
I created a small program to test when a LINQ statement is executed, especially when a Where is executed.
A function that returns an IEnumerable:
IEnumerable<int> GetNumbers()
{
for (int i=0; i<10; ++i)
{
yield return i;
}
}
A program that uses this enumeration using an old fashioned Enumerator
public static void Main()
{
IEnumerable<int> number = GetNumbers();
IEnumerable<int> smallNumbers = numbers.Where(number => number < 3);
IEnumerator<int> smallEnumerator = smallNumbers.GetEnumerator();
bool smallNumberAvailable = smallEnumerator.MoveNext();
while (smallNumberAvailable)
{
int smallNumber = smallEnumerator.Current;
Console.WriteLine(smallNumber);
smallNumberAvailable = smallEnumerator.MoveNext();
}
}
During debugging I can see that the first time GetNumbers is executed the first time that MoveNext() is called. GetNumbers() is executed until the first yield return statement.
Every time that MoveNext() is called the statements after the yield return are performed until the next yield return is executed.
Changing the code such that the enumerator is accessed using foreach, Any(), FirstOrDefault(), ToDictionary, etc, shows that the calls to these functions are the time that the originating source is actually accessed.
if (smallNumbers.Any())
{
int x = smallNumbers.First();
Console.WriteLine(x);
}
Debugging shows that the originating source starts enumerating from the beginning twice. So indeed, it is not wise to do this, especially if you need to do a lot to calculate the first element (GroupBy, OrderBy, Database access, etc)
Related
I am trying to get data from linq in asp.net core. I have a table with a Position with a FacultyID field, how do I get it from the Position table with an existing userid. My query
var claimsIdentity = _httpContextAccessor.HttpContext.User.Identity as ClaimsIdentity;
var userId = claimsIdentity.FindFirst(ClaimTypes.NameIdentifier)?.Value.ToString();
var data = _context.Positions.Where(p => p.UserID.ToString() == userId).Select(x => x.FacultyID).???;
What can I add after the mark? to get the data. Thank you so much
There are several things you can do. An example in your case would be:
var data = _context.Positions.Where(p => p.UserID.ToString() == userId).Select(x => x.FacultyID).FirstOrDefault();
If you expect more than 1 results, then you would do:
var data = _context.Positions.Where(p => p.UserID.ToString() == userId).Select(x => x.FacultyID).ToList();
You have to be aware of the difference between a query and the result of a query.
The query does not represent the data itself, it represents the potential to fetch some data.
If you look closely to the LINQ methods, you will find there are two groups: the LINQ methods that return IQueryable<...> and the others.
The IQueryable methods don't execute the query. These functions are called lazy, they use deferred execution. You can find these terms in the remarks section of every LINQ method.
As long as you concatenate IQueryable LINQ methods, the query is not executed. It is not costly to concatenate LINQ methods in separate statements.
The query is executed as soon as you start enumerating the query. At its lowest level this is done using GetEnumerator and MoveNext / Current:
IQueryable<Customer> customers = ...; // Query not executed yet!
// execute the query and process the fetched data
using (IEnumerator<Customer> enumerator = customers.GetEnumerator())
{
while(enumerator.MoveNext())
{
// there is a Customer, it is in property Current:
Customer customer = enumerator.Current;
this.ProcessFetchedCustomer(customer);
}
}
This code, or something very similar is done when you use foreach, or one of the LINQ methods that don't return IQueryable<...>, like ToList, ToDictionary, FirstOrDefault, Sum, Any, ...
var data = dbContext.Positions
.Where(p => p.UserID.ToString() == userId)
.Select(x => x.FacultyID);
If you use your debugger, you will see that data is an IQueryable<Position>. You'll have to use one of the other LINQ methods to execute the query.
To get all Positions in the query:
List<Position> fetchedPositions result = data.ToList();
If you expect only one position:
Position fetchedPosition = data.FirstOrDefault();
If you want to know if there is any position at all:
if (positionAvailable = data.Any())
{
...
}
Be aware: if you use the IQueryable, the data will be fetched again from the DbContext. So if you want to do all three statements efficiently these, make sure you don't use the original data three times:
List<Position> fetchedPositions result = data.ToList();
Position firstPosition = fetchedPostion.FirstOrDefault();
if (firstPosition != null)
{
ProcessPosition(firstPosition);
}
Basically I have a procedure like
var results = await Task.WhenAll(
from input in inputs
select Task.Run(async () => await InnerMethodAsync(input))
);
.
.
.
private static async Task<Output> InnerMethodAsync(Input input)
{
var x = await Foo(input);
var y = await Bar(x);
var z = await Baz(y);
return z;
}
and I'm wondering whether there's a fancy way to combine this into a single LINQ query that's like an "async stream" (best way I can describe it).
When you use LINQ, there are generally two parts to it: creation and iteration.
Creation:
var query = list.Select( a => a.Name);
These calls are always synchronous. But this code doesn't do much more than create an object that exposes an IEnumerable. The actual work isn't done till later, due to a pattern called deferred execution.
Iteration:
var results = query.ToList();
This code takes the enumerable and gets the value of each item, which typically will involve the invocation of your callback delegates (in this case, a => a.Name ). This is the part that is potentially expensive, and could benefit from asychronousness, e.g. if your callback is something like async a => await httpClient.GetByteArrayAsync(a).
So it's the iteration part that we're interested in, if we want to make it async.
The issue here is that ToList() (and most of the other methods that force iteration, like Any() or Last()) are not asynchronous methods, so your callback delegate will be invoked synchronously, and you’ll end up with a list of tasks instead of the data you want.
We can get around that with a piece of code like this:
public static class ExtensionMethods
{
static public async Task<List<T>> ToListAsync<T>(this IEnumerable<Task<T>> This)
{
var tasks = This.ToList(); //Force LINQ to iterate and create all the tasks. Tasks always start when created.
var results = new List<T>(); //Create a list to hold the results (not the tasks)
foreach (var item in tasks)
{
results.Add(await item); //Await the result for each task and add to results list
}
return results;
}
}
With this extension method, we can rewrite your code:
var results = await inputs.Select( async i => await InnerMethodAsync(i) ).ToListAsync();
^That should give you the async behavior you're looking for, and avoids creating thread pool tasks, as your example does.
Note: If you are using LINQ-to-entities, the expensive part (the data retrieval) isn't exposed to you. For LINQ-to-entities, you'd want to use the ToListAsync() that comes with the EF framework instead.
Try it out and see the timings in my demo on DotNetFiddle.
A rather obvious answer, but you have just used LINQ and async together - you're using LINQ's select to project, and start, a bunch of async Tasks, and then await on the results, which provides an asynchronous parallelism pattern.
Although you've likely just provided a sample, there are a couple of things to note in your code (I've switched to Lambda syntax, but the same principals apply)
Since there's basically zero CPU bound work on each Task before the first await (i.e. no work done before var x = await Foo(input);), there's no real reason to use Task.Run here.
And since there's no work to be done in the lambda after call to InnerMethodAsync, you don't need to wrap the InnerMethodAsync calls in an async lambda (but be wary of IDisposable)
i.e. You can just select the Task returned from InnerMethodAsync and await these with Task.WhenAll.
var tasks = inputs
.Select(input => InnerMethodAsync(input)) // or just .Select(InnerMethodAsync);
var results = await Task.WhenAll(tasks);
More complex patterns are possible with asynchronony and Linq, but rather than reinventing the wheel, you should have a look at Reactive Extensions, and the TPL Data Flow Library, which have many building blocks for complex flows.
Try using Microsoft's Reactive Framework. Then you can do this:
IObservable<Output[]> query =
from input in inputs.ToObservable()
from x in Observable.FromAsync(() => Foo(input))
from y in Observable.FromAsync(() => Bar(x))
from z in Observable.FromAsync(() => Baz(y))
select z;
Output[] results = await query.ToArray();
Simple.
Just NuGet "System.Reactive" and add using System.Reactive.Linq; to your code.
I have the below method:
private static List<List<job>> SplitJobsByMonth(IEnumerable<job> inactiveJobs)
{
List<List<job>> jobsByMonth = new List<List<job>>();
DateTime cutOff = DateTime.Now.Date.AddMonths(-1).Date;
cutOff = cutOff.AddDays(-cutOff.Day + 1);
List<job> temp;
while (inactiveJobs.Count() > 0)
{
temp = inactiveJobs.Where(j => j.completeddt >= cutOff).ToList();
jobsByMonth.Add(temp);
inactiveJobs = inactiveJobs.Where(a => !temp.Contains(a));
cutOff = cutOff.AddMonths(-1);
}
return jobsByMonth;
}
It aims to split the jobs by month. 'job' is a class, not a struct. In the while loop, the passed in IEnumerable is reset with each iteration to remove the jobs that have been processed:
inactiveJobs = inactiveJobs.Where(a => !temp.Contains(a));
Typically this reduces the content of this collection by quite a lot. However, on the next iteration the line:
temp = inactiveJobs.Where(j => j.completeddt >= cutOff).ToList();
restores the inactiveJobs object to the state it was when it was passed into the method - so the collection is full again.
I have solved this problem by refactoring this method slightly, but I am curious as to why this issue occurs as I can't explain it. Can anyone explain why this is happening?
Why not just use a group by?
private static List<List<job>> SplitJobsByMonth(IEnumerable<job> inactiveJobs)
{
var jobsByMonth = (from job in inactiveJobs
group job by new DateTime(job.completeddt.Year, job.completeddt.Month, 1)
into g
select g.ToList()).ToList();
return jobsByMonth;
}
This happens because of deferred execution of LINQ's Where.
When you do this
inactiveJobs = inactiveJobs.Where(a => !temp.Contains(a));
no evaluation is actually happening until you start iterating the IEnumerable. If you add ToList after Where, the iteration would happen right away, so the content of interactiveJobs would be reduced:
inactiveJobs = inactiveJobs.Where(a => !temp.Contains(a)).ToList();
In LINQ, queries have two different behaviors of execution: immediate and deferred.
The query is actually executed when the query variable is iterated over, not when the query variable is created. This is called deferred execution.
You can also force a query to execute immediately, which is useful for caching query results.
In order to make this add .ToList() in the end of your line:
inactiveJobs = inactiveJobs.Where(a => !temp.Contains(a)).ToList();
This executes the created query immediately and writes result to your variable.
You can see more about this example Here.
I am saving a bunch of items to my database using async saves
var tasks = items.Select(item =>
{
var clone = item.MakeCopy();
clone.Id = Guid.NewGuid();
return dbAccess.SaveAsync(clone);
});
await Task.WhenAll(tasks);
I need to verify how many times SaveAsync was successful (It throws and exception if something goes wrong). I am using IsFaulted flag to examine the tasks:
var successCount = tasks.Count(t => !t.IsFaulted);
Collection items consists of 3 elements so SaveAsync should have been called three times but it is executed 6 times. Upon closer examination I noticed that counting non-faulted tasks with c.Count(...) causes each of the task to re-run.
I suspect it has something to do with deferred LINQ execution but I am not sure why exactly and how to fix this.
Any suggestion why I observe this behavior and what would be the optimal pattern to avoid this artifact?
It happens because of multiple enumeration of your Select query.
In order to fix it, force enumeration by calling ToList() method. Then it will work correctly.
var tasks = items.Select(item =>
{
var clone = item.MakeCopy();
clone.Id = Guid.NewGuid();
return dbAccess.SaveAsync(clone);
})
.ToList();
Also you may take a look at these more detailed answers:
https://stackoverflow.com/a/8240935/3872935
https://stackoverflow.com/a/20129161/3872935.
I took a second to look why my app had terrible performance. All i did was pause the debugger twice and i found it.
Is there a practical reason why it runs my code everytime? The only way i know to prevent this is to add ToArray() at the end. I guess i need to revise all my code and make sure they return arrays?
Online demo http://ideone.com/EUfJN
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
class Test
{
static void Main()
{
string[] test = new string[] { "a", "sdj", "bb", "d444"};
var expensivePrint = false;
IEnumerable<int> ls = test.Select(s => { if (expensivePrint) { Console.WriteLine("Doing expensive math"); } return s.Length; });
expensivePrint = true;
foreach (var v in ls)
{
Console.WriteLine(v);
}
Console.WriteLine("If you dont think it does it everytime, lets try it again");
foreach (var v in ls)
{
Console.WriteLine(v);
}
}
}
Output
Doing expensive math
1
Doing expensive math
3
Doing expensive math
2
Doing expensive math
4
If you dont think it does it everytime, lets try it again
Doing expensive math
1
Doing expensive math
3
Doing expensive math
2
Doing expensive math
4
Enumerables evaluate lazily (only when required). Add a .ToList() after the select and it will force evaluation.
LINQ has lazy evaluation methods and Select is one of them.
And the thing is you are using foreach two times and it prints the values two times.
The Select causes the iterator to be... iterated.
If it is expensive to build the result, you can .ToList() the result once, then use that list going forward.
List<int> resultAsList = ls.ToList();
// Use resultAsList in each of the foreach statements
When you are building the query
IEnumerable<int> ls = test.Select(s => { if (expensivePrint) { Console.WriteLine("Doing expensive math"); } return s.Length; });
It actually does not EXECUTE and cache the result as you are apparently expecting. This is called "deffered execution".
It just builds the query. The execution of the query actually takes place when the foreach statement is called on the query.
If you call ToList() or ToArray() or Sum() or Average() or any operator of the kind on your query, it will however execute it IMMEDIATELY.
The best thing to do if you want to keep the result of the query, is to cache it in a array or list by calling ToList() or ToArray() and to enumerate on this list or array rather than on the constructed query.
Please refer to the documentation of Enumerable.Select
This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic.
By iterating the result of the Select method, the query is executed. foreach is one way to iterate that result. ToArray is another.
Is there a practical reason why it runs my code everytime?
Yes, if the result was not deferred, then more iteration would be performed than necessary:
IEnumerable<string> query = Enumerable.Range(0, 100000)
.Select(x => x.ToString())
.Where(s => s.Length == 6)
.Take(5);
This is Linq's deferred execution. If you need a concise yet complete explanation, read this:
http://weblogs.asp.net/dixin/archive/2010/03/16/understanding-linq-to-objects-6-deferred-execution.aspx
I would suggest you use .ToArray() which will return int[] for you and will give better performance
Reason why int[] because it will be declare and create at once, well as the List<T> wil be created one by one at runtime
int[] array = test.Select(s =>
{
if (expensivePrint)
{
Console.WriteLine("Doing expensive math");
}
return s.Length;
}).ToArray();