Confused About Why I have no extension methods available - c#

I'm implementing a Unit OF Work pattern (not my choice, I know it's considered by some as an anti-pattern).
I've come across a situation I don't fully understand.
My generic repo constructor:
this.context = context;
this.dbSet = context.Set<T>();
Generic method:
public virtual async Task<IEnumerable<T>> All()
{
return await dbSet.ToListAsync();
}
Using it like this:
var languages = await _unitOfWork.Languages.All();
languages = languages.OrderBy(x => x.Order);
As shown above, I need to go down a line so I can use OrderBy,
I don't understand why.
A second question is, ToListAsync is supposed to return a list, why do I get an IEnumerable?

A second Question is, ToListAsync is suppose to return a list, why do I get an IENUMERABLE?
List is an implementation of IEnumerable. Check What does it mean to "program to an interface"?.
As shown above, I need to go down a line so I can use OrderBy, I don't why.
_unitOfWork.Languages.All() returns a Task, you should get its unwrapped result of IEnumerable to apply OrderBy on it.
To make this work as you expected, you should apply OrderBy on the awaited result:
(await _unitOfWork.Languages.All()).OrderBy(x => x.Order);

Your confusion as far as having them on the same line seems to be on how it's applying the await. This should work fine:
var languages = (await _unitOfWork.Languages.All()).OrderBy(x => x.Order);
You need to do this because your All function is returning a Task while OrderBy is not.
Additionally, you probably don't want to call .ToListAsync() in the first place, that's just a SELECT * without any limit or where clause which may not be the "end of the world" all the time but often will be extremely detrimental to your performance.
As for your second question about why it's returning an IEnumerable<T> it's because List<T> implements IEnumerable<T> and your function signature indicates that it returns IEnumerable<T>
public virtual async Task<IEnumerable<T>> All()
I highly recommend instead of ever dealing with .ToListAsync or IEnumerable<T> you leave it alone and let IQueryable<T> do its proper work against the database.
EDIT: It's been asked in comments why it matters that you'd use IQueryable<T>. The answer is because utilizing .Where, .Any() etc... against IQueryable will mold the underlying query against the DB.
For example, say you want to find a single entity with the Id of 123. If you leave your .ToListAsync() without any other modifications it will pull back every single row in the DB into your program in memory and then iterate every single row looking for that one row.
If, instead, you use IQueryable<T> - at the point where you apply .FirstOrDefault(e => e.Id == 123) it will be applied to the database and it will apply as something like SELECT TOP(1) * FROM MyEntity WHERE [Id] = 123 - pulling back a single row instead of every row in existence.
EDIT2: Note that this also applies to projection. This means something like .Select(e => new { FullName = e.FirstName + " " + e.LastName, State = e.Address.State}) will only pull those 2 columns instead of all columns and any nav properties you include. Doing that after a .ToList() or .ToListAsync will instead pull back all columns/entities and iterate over them to create a whole new set of other entities. This can result in MASSIVE CPU/Memory differences between the 2 approaches.

Related

Entity Framework using await without ToListAsync()

The following code works.
employees.AddRange(from e in DbContext.Employees
select new EmployeeSummaryModel
{
Name = e.Name,
Email = e.Email
});
But how would I rewrite this query to use await? Note that I don't want to use ToListAsync() as that unnecessarily creates a second list?
AddRange is inherently synchronous. There are multiple options to make this asynchronous, depending on how the results are used
Just a single list
One option would be to use the list returned by ToListAsync instead of creating a list in advance:
var employees= await query.ToListAsync();
There's no reason to have two lists if the first one is empty.
Use async stream
Another option is to use AsAsyncEnumerable to execute the query asynchronously and get back an IAsyncEnumerable. This has to be iterated to copy the data to the existing list item by item:
await foreach(var emp in query.AsAsyncEnumerable())
{
employees.Add(emp);
}
This is useful if the original list contains data already. Both Add and AddRange are going to cause reallocations of the list's internal buffer though.
Use an empty list with capacity
If you have even a rough idea of the number of results, you could create a list with a specific capacity and avoid at least some reallocations :
var employees=new List<Employee>(100);
await foreach(var emp in query.AsAsyncEnumerable())
{
employees.Add(emp);
}
Async pipeline
If you have a lot of data though, you probably shouldn't be storing them in a list even temporarily. You could create a pipeline of methods that accept and return IAsyncEnumerable and process the data as they arrive, only caching them at the end. You could use System.Linq.Async's operators for this
query.AsAsyncEnumerable()
.Select(emp=>EnrichFromApiAsync(emp,someUrl))
...
.ToListAsync();
Or convert it to an Observable with ToObservable(), processing the final transformed data as it reaches the end of the pipeline
I don't see how you can use AddRange in this case but with latest EF Core and C# 8.0 async streams you can use AsAsyncEnumerable method resulting in:
var query = ...;
await foreach(var e in query.AsAsyncEnumerable())
{
employees.Add(e);
}

Linq Select query which returns a task enumerates multiple times

I have a linq query which returns a task object and stores it in an IEnumerable. For some reason the select query keeps enumerating until the task is started or finished (I think, it's hard to debug).
The query is pretty straight forward:
Context.RetrieveDataTasks = retrievableProducts.Select(product => Context.HostController.RetrieveProductDataFiles(product));
Where the signature for RetrieveProductDataFiles is :
public Task RetrieveProductDataFiles(IProduct product)
The retrievableProducts is in this case a list of 1 product:
var retrievableProducts = products
.Where(product => AFancyButIrrelevantClause)
.ToList();
I don't mind to rewrite the code to a foreach loop where I fill a new list manually to avoid this problem, but I'd like to understand why the select query keeps executing. I think it has something to do with the task which is waiting for activation, but I have no idea why that would cause this.
Edit:
Just to be complete, I'd expect that above code works exactly the same as :
var retrievableDataTasks = new List<Task>();
foreach (var product in retrievableProducts)
{
retrievableDataTasks.Add(Context.HostController.RetrieveProductDataFiles(product));
}
Context.RetrieveDataTasks = retrievableDataTasks;
While the construction with a foreach does exactly what I expect: it populates a list with tasks (in this specific case a list of 1 task) and this task is executed once. While in the construction with the Select query that same 1 task is started over and over again.
I hope it's clear enough with the code I provided, looking forward to learn why the select query behaves differently (and if possible, how to avoid it from happening).
The correct answer to the posted question is by "bas" himself. Every time you reference the IEnumerable it re-evaluates the expression inside of "Select" and thus starts the tasks again. "ToList" would actually fix the problem because it would stop evaluating them.
Using 'ToList' forces the iterator to iterate through all the collection, even though you think you said 'simply give me the first two items in the collection'. If that said collection has 1000 elements, you'll iterate on that collection until you've reached the last item, and it'll still give you 2 elements.
You consume an iterator method by using a foreach statement or LINQ query. Each iteration of the foreach loop calls the iterator method. When a yield return statement is reached in the iterator method, expression is returned, and the current location in code is retained. Execution is restarted from that location the next time that the iterator function is called.
In your method where you instantiate a list where you add to it, you'd need to improve a little to use yield returns and thus, not allocate data that doesn't need to be allocated. LINQ methods are lazy evaluated, which means that there won't be any memory allocation for data until you try to materialize the results (ToList for instance). While you're in your LINQ method, the only memory usage you get is for the current iteration, not for everything that's found in your collection.
Let's say use the following code snippet to help you.
private static IEnumerable<Product> GetMyProducts(IEnumerable<Product> products, bool AFancyButIrrelevantClause)
{
foreach(var product in products)
{
if(AFancyButIrrelevantClause)
yield return product;
}
}
or directly in LINQ to be more concise:
products.Where(product => AFancyButIrrelevantClause)

Using LINQ Where result in foreach: hidden if statement, double foreach?

foreach (Person criminal in people.Where(person => person.isCriminal)
{
// do something
}
I have this piece of code and want to know how does it actually work. Is it equivalent to an if statement nested inside the foreach iteration or does it first loop through the list of people and repeats the loop with selected values? I care to know more about this from the perspective of efficiency.
foreach (Person criminal in people)
{
if (criminal.isCriminal)
{
// do something
}
}
Where uses deferred execution.
This means that the filtering does not occur immediately when you call Where. Instead, each time you call GetEnumerator().MoveNext() on the return value of Where, it checks if the next element in the sequence satisfies the condition. If it does not, it skips over this element and checks the next one. When there is an element that satisfies the condition, it stops advancing and you can get the value using Current.
Basically, it is like having an if statement inside a foreach loop.
To understand what happens, you must know how IEnumerables<T> work (because LINQ to Objects always work on IEnumerables<T>. IEnumerables<T> return an IEnumerator<T> which implements an iterator. This iterator is lazy, i.e. it always only yields one element of the sequence at once. There is no looping done in advance, unless you have an OrderBy or another command which requires it.
So if you have ...
foreach (string name in source.Where(x => x.IsChecked).Select(x => x.Name)) {
Console.WriteLine(name);
}
... this will happen: The foreach-statement requires the first item which is requested from the Select, which in turn requires one item from Where, which in turn retrieves one item from the source. The first name is printed to the console.
Then the foreach-statement requires the second item which is requested from the Select, which in turn requires one item from Where, which in turn retrieves one item from the source. The second name is printed to the console.
and so on.
This means that both of your code snipptes are logically equivalent.
It depends on what people is.
If people is an IEnumerable object (like a collection, or the result of a method using yield) then the two pieces of code in your question are indeed equivalent.
A naïve Where could be implemented as:
public static IEnumerable<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
// Error handling left out for simplicity.
foreach (TSource item in source)
{
if (predicate(item))
{
yield return item;
}
}
}
The actual code in Enumerable is a bit different to make sure that errors from passing a null source or predicate happen immediately rather than on the deferred execution, and to optimise for a few cases (e.g. source.Where(x => x.IsCriminal).Where(x => x.IsOnParole) is turned into the equivalent of source.Where(x => x.IsCriminal && x.IsOnParole) so that there's one fewer step in the chains of iterations), but that's the basic principle.
If however people is an IQueryable then things are different, and depend on the details of the query provider in question.
The simplest possibility is that the query provider can't do anything special with the Where and so it ends up just doing pretty much the above, because that will still work.
But often the query provider can do something else. Let's say people is a DbSet<Person> in Entity Framework assocated with a table in a database called people. If you do:
foreach(var person in people)
{
DoSomething(person);
}
Then Entity Framework will run SQL similar to:
SELECT *
FROM people
And then create a Person object for each row returned. We could do the same filtering in about to implement Where but we can also do better.
If you do:
foreach (Person criminal in people.Where(person => person.isCriminal)
{
DoSomething(person);
}
Then Entity Framework will run SQL similar to:
SELECT *
FROM people
WHERE isCriminal = 1
This means that the logic of deciding which elements to return is done in the database before it comes back to .NET. It allows for indices to be used in computing the WHERE which can be much more efficient, but even in the worse case of there being no useful indices and the database having to do a full scan it will still mean that those records we don't care about are never reported back from the database and there is no object created for them just to be thrown away again, so the difference in performance can be immense.
I care to know more about this from the perspective of efficiency
You are hopefully satisfied that there's no double pass as you suggested might happen, and happy to learn that it's even more efficient than the foreach … if you suggested when possible.
A bare foreach and if will still beat .Where() against an IEnumerable (but not against a database source) as there are a few overheads to Where that foreach and if don't have, but it's to a degree that is only worth caring about in very hot paths. Generally Where can be used with reasonable confidence in its efficiency.

Predicate inside a Where<> query not hit?

I need to call the method for each item in the list. So, I have used Where<> query as follows,
List<string> list = new List<string>();
list.Add("Name1");
list.Add("Name2");
list.Add("Name3");
var name = list.Where(n =>
{
return CheckName(n);
});
But in the above case, CheckName() is not hit. The same method is triggered if I use FirstOrDefault<>. I don't know whether it is a framework break or I am going in a wrong way.
As additional info, I am using.NET Framework 4.5.
Has anyone experienced this error? If so, is there any solution to overcome this issue?
You are understanding incorrectly the result of the Where condition. As linq is deffered executed it will only enter the where condition when materialized (by a ToList/FirstOrDefault/Sum and such).
The Where is never actually materialized in your current code (It did as you experienced when using FirstOrDefault) and as such it will never enter the CheckName method. Then, as Where will never return null but "worst case" an empty collection, which is not null, the result is true.
If you debug you will see that name equals true at the end of this. To "overcome" this depends on what is your desired output:
If you want to know if you have any item that matched the predicate then:
var result = list.Any(CheckName);
If you want to retrieve those that match the predicate:
var result = list.Where(CheckName);
If later you want to query and check if results contains anything then:
if(result.Any()) { /* ... */ }
If you only want the results (and thus materializing the query):
list.Where(CheckName).ToList();
Read more about linq being deffered executed here:
Linq and deffered execution
What are the benefits of a Deferred Execution in LINQ?
Just as a side note see how you can change your current code from:
var name = list.Where(n =>
{
return CheckName(n);
})
To:
var name = list.Where(n => CheckName(n));
And eventually to:
var name = list.Where(CheckName);
LINQ has a Deferred Execution principal which means the query will not be executed until and unless you access name variable. If you want to execute it immediately, (just for example) add .ToList() in the end, which is exactly what FirstOrDefault does. It does immediate execution instead of deferred execution.
var name = list.Where(n =>
{
return CheckName(n);
}).ToList() != null;
Also where condition result will never be null. Even if there is no object in list satisfying your condition(s) in CheckName, where will return an empty collection.
The CheckName() method is not executed because of Deferred execution of Linq. The actual statement is not executed till you actually access it. So in your case, for the CheckName(), you should do something like:
var name = list.Where(n =>
{
return CheckName(n);
}).ToList();
When you look at the Where-Method source code you can easily see why:
internal static IEnumerable<T> Where<T>(this IEnumerable<T> enumerable, Func<T, bool> where) {
foreach (T t in enumerable) {
if (where(t)) {
yield return t;
}
}
}
The yield will cause the execution to only happen once the returned IEnumerable<T> is actually accessed. That is what is called deferred execution.
If you need to call a method for each item in a list then you should use a simple for loop:
foreach var name in list
CheckName(name);
Just because LINQ is available, doesn't mean it should be used everywhere there is a collection. It is important to write code that makes sense and is self commenting and using it here has simultaneously introduced a flaw into your logic and made your code harder to read, understand and maintain. It's the wrong tool for the stated purpose
Doubtless you have additional requirements not stated here, like "I want to check every name in a list and make sure that none are null". You can and possibly should use linq for this but it looks more like
bool allNamesOK = list.All(n => n != null);
This code is compact and reads well; we can clearly see the intention (though I wouldn't call the list "list" - "names" would better)

Trying to figure out when queries in Entity Framework are executed

This is my first time working with Entity Framework (EF) and I'm trying to learn what exactly executes a query on my database and what doesn't.
This is the code I'm working with. Don't mind the functionality, it isn't important for this question.
using (var db = new Context())
{
//Check if any reviews have been given.
if (combinedReviews.Any())
{
var restaurantsReviewedIds = combinedReviews.Select(rev => rev.RestaurantId);
//(1)
ratedRestaurants = db.Restaurants.Where(rest => restaurantsReviewedIds.Contains(rest.Id))
.DistinctBy(rest => rest.Id)
.ToList();
}
//(2)
var restsClose = db.Restaurants.Where(rest => db.Reviews.Any(rev => rev.RestaurantId == rest.Id))
.OrderBy(rest => rest.Location.Distance(algorithmParams.Location))
.Take(algorithmParams.AmountOfRecommendations);
//(3)
tempList = ratedRestaurants.Union(restsClose).ToList();
var tempListIds = tempList.Select(rest => rest.Id); //Temporary list.
//(4)
restsWithAverage = db.Reviews.Where(rev => tempListIds.Contains(rev.RestaurantId))
.GroupBy(rev => rev.RestaurantId)
.ToList();
}
I have marked each piece of code with numbers, so I'll refer to them with that. Below is what I think is what happens.
This executes a query since I'm calling .ToList() here.
This returns an IQueryable, so this won't execute a query against the database.
This executes the query from (2).
This executes another query since I'm calling .ToList().
How close to the truth am I? Is all of this correct? If this doesn't make sense, could you give an example what executes a query and what doesn't?
I'm sorry for asking so many questions in one question, but I thought I wouldn't need to create so many questions since all of this is about a single topic.
If you don't want to execute a query you can use AsEnumerable.
ToList vs AsEnumerable
ToList – converts an IEnumerable<T> to a List<T>. The advantage of using AsEnumerable vs. ToList is that AsEnumerable does not execute the query. AsEnumerable preserves deferred execution and does not build an often useless intermediate list.
On the other hand, when forced execution of a LINQ query is desired, ToList can be a way to do that.
You could also force execution by putting a For Each loop immediately after the query expression, but by calling ToList or ToArray you cache all the data in a single collection object.
ToLookup and ToDictionary also executing the queries.
Here you can find a list of operators and if they are executing query:
https://msdn.microsoft.com/en-us/library/mt693095.aspx.
Linq query execution is different per query. I recommend reading the following page: https://msdn.microsoft.com/en-us/library/bb738633(v=vs.110).aspx

Categories