Converting a SQL server query to EF Core LINQ - c#

I am currently trying to convert an existing SQL server query to EF Core. The goal is to get all users and get their latest order date-time and latest support request date-time. I want to ensure users are returned even if they don't have an order yet or a support request yet. If they have not placed an order, the column for "latest order date-time" should be NULL. If they have not filed a support request, the column for "latest support request date-time" should be NULL.
The outputted columns should be: Id, Name, Email, LatestOrderDateTime, LatestSupportRequestDateTime
Here is my working SQL server query:
SELECT [User].[Id], [User].[Name], [User].[Email], MAX([Order].[DateTime]) as LatestOrderDateTime, MAX([SupportRequest].[DateTime]) as LatestSupportRequestDateTime FROM [User]
LEFT JOIN [Order] on [User].[Id] = [Order].[UserId]
LEFT JOIN [SupportRequest] on [User].[Id] = [SupportRequest].[ConsumerId]
GROUP BY [User].[Id], [User].[Name], [User].[Email]
ORDER BY [User].[Id]
This is what I've tried, however it does not evaluate on the server:
await this.context.User
.GroupBy(u => new { u.Id, u.Name, u.Email })
.Select(g => new
{
id = g.Key.Id,
name = g.Key.Name,
email = g.Key.Email,
lastOrderDateTime = g.Max(o => o.Orders.Select(o => o.DateTime)),
lastSupportRequestDateTime = g.Max(o => o.SupportRequests.Select(s => s.DateTime)),
})
.OrderBy(c => c.id)
.ToListAsync();
I just want to convert this query to EF core (where the query DOES NOT get evaluated locally).
If you could do it in method syntax, that'd be great, but no worries if not since I can convert it with JetBrains Rider.
Thank you so much for your help!

I just want to convert this query to EF core (where the query DOES NOT get evaluated
locally).
Can not be done, use EntityFramework 6.4, not core, if you want this.
The SQL generation in current EntityFramework (and I mean current up to the nightly builds of veryion 5) is EXTREMELY Limited in the SQL it can generate, combined with what looks like utter ignorance to even accept that fact from the team (which reminds me of the times of EntityFramework 2 and 3 until that team started being serious about LINQ in their version 4).
If it tells you it can not generate this as SQL then your only 2 choises are:
Use EntityFramework 6.4 (which works in dotnetcore 3.1) and get server side execution
Open a bug report, HOPE someone picks it up and then either wait until November for the release of version 5 OR - once it possibly is fixed - work with nightly builds until then.
This is not a syntax issue. They deactivated client evaluation of SQL and their SQL generator is not able to handle a LOT of standard cases. Given you do not want the first (which is what we do at the moment), their feature set just means it can not be done.

You could try to explicitly spell out the left joins in Linq (left join syntax is a bit un-intuitive iirc so it may take some doing to sort it out).
You can find more information at:
https://learn.microsoft.com/en-us/dotnet/csharp/linq/perform-left-outer-joins
The way your Linq is set up specifically asks for object linking, which is why it happens client side. I believe what you're trying to do has a solution in EF Core.

Related

My join .NetCore 3.1 throws an exception about NavigationExpandingExpressionVisitor, what is that?

I have a .NetCore 3.1 project. I know that there are breaking changes from EF Core 2 to 3 but searching for the solution to this is leading me places that make no sense.
The following works in .NetCore 2.2.
I have a list of user-names that is generated from other queries. I now want to find those user-names in our personnel database with the goal of returning the associated email address for each user-name. A person may elect to use a company email address or supply a different address. If the person.EmailAddress field is empty then the address I need is the username with the company domain appended.
private static List<string> GetEmailAddrsFromBp(PersonnelContext personnelContext, IEnumerable<string> userNames) {
try {
var personEmail = (
from person in personnelContext.Persons
join userName in userNames
on person.userName.Trim().ToLower() equals userName.Trim().ToLower()
where person.ActualEndDate == null
select person.EmailAddress.Trim().Equals("")
? person.userName.Trim().ToLower() + "#myCompany.com"
: person.EmailAddress.Trim().ToLower()
).Distinct().OrderBy(a => a).ToList();
return personEmail;
} catch (Exception e) {
throw new Exception("GetEmailAddrsFromBp: " + e.Message);
}
}
in 3.1 I get the exception:
Processing of the LINQ expression 'DbSet<Persons>
.Join(
outer: __p_0,
inner: person => person.userName.Trim().ToLower(),
outerKeySelector: userName => userName.Trim().ToLower(),
innerKeySelector: (person, userName) => new {
person = person,
userName = userName
})' by 'NavigationExpandingExpressionVisitor' failed. This may indicate either a bug or a limitation in EF Core. See https://go.microsoft.com/fwlink/?linkid=2101433 for more detailed information.
I do not understand this error. Going to the suggested Microsoft site is not helpful. Other googling has proven unhelpful. What is going on? How do you do "simple" joins now?
I do not understand this error.
The error message of course is not user friendly. The only relevant part is
This may indicate either a bug or a limitation in EF Core.
which can safely be read as "This is either a bug or a limitation in EF Core."
What is going on? How do you do "simple" joins now?
You can do "simple" joins, but not joins to memory collections. In fact joins to memory collections were never really supported. Just EF Core 1.x / 2.x used the so called client evaluation for the things it cannot translate. But implicit client evaluation has been removed in 3.0, and now you are supposed to either find a translatable construct, or switch explicitly to client evaluation through LINQ to Objects (or System.Linq.Async).
Since specifically for joins switching to client evaluation is not efficient, it's better to find/use a translatable query construct. If you use non-equi or multi-key join, you basically have no option. But for single key equi-join there is a construct which is supported in all EF / EF Core version, and it is Enumerable.Contains which translates to SQL IN (val1, val2, ..., valN).
So the solution for you concrete case would be something like this:
userNames = userNames.Select(userName => userName.Trim().ToLower()).Distinct();
var personEmail = (
from person in personnelContext.Persons
where userNames.Contains(person.userName.Trim().ToLower())
// the rest of the query unchanged...

EF Core with Postgres - Poor performance compared to same query as raw SQL

I'm trying to diagnose the exact issue with a query that I wrote in C# against a Postgres DB that I've generated a context for in a .NET Core WebAPI project with Scaffold-DbContext.
I'm expecting similar speeds for both queries, but when I use the Postgres ODBC driver or PgAdmin to run a query that gives me the same result set, I'm wondering: Why does the "plain SQL" version perform so much better?
My query in SQL:
SELECT oolk.salesrep, SUM(oool.openqty)
FROM public.oolookup oolk
INNER JOIN public.ooorderlines oool ON oool.orderlinekey = oolk.orderlinekey
WHERE oolk.salesteam = 'Team1' AND oolk.categorycode = 'Category 8'
GROUP BY oolk.salesrep
Running this query via ODBC and returning the result as JSON via WebAPI (localhost) takes: 2216ms
The "same" query in C#:
(from oolk in db.Oolookup
join oool in db.Ooorderlines on oolk.Orderlinekey equals oool.Orderlinekey
where oolk.Salesteam == "Team1"
where oolk.Categorycode == "Category 8"
group new { oolk, oool } by oolk.Salesrep into g
select new
{
SalesRep = g.Key,
OpenQty = g.Sum(gr => gr.oool.Openqty)
}).ToList()
Running this query and returning the result as JSON via WebAPI (localhost) takes: 8353ms
When I use DB logging in my C# code, this is the query that appears to get sent to my PG database by the query expression.
SELECT "oolk0"."pk_oolookup", "oolk0"."categorycode", "oolk0"."customercode", "oolk0"."customerdiv", "oolk0"."customergroup", "oolk0"."forecastgroup", "oolk0"."orderclass", "oolk0"."orderkey", "oolk0"."orderlinekey", "oolk0"."productcode", "oolk0"."saleslocation", "oolk0"."salesrep", "oolk0"."salesteam", "oolk0"."shippinglocation", "oool0"."orderlinekey", "oool0"."openamount", "oool0"."openappliedheadercharges", "oool0"."openitemamount", "oool0"."openlinechargeamount", "oool0"."openqty", "oool0"."openvolume", "oool0"."openweight", "oool0"."totalamount", "oool0"."totalheaderchargeamount", "oool0"."totalitemamount", "oool0"."totalqty", "oool0"."totalvolume", "oool0"."totalweight"
FROM "oolookup" AS "oolk0"
INNER JOIN "ooorderlines" AS "oool0" ON "oolk0"."orderlinekey" = "oool0"."orderlinekey"
WHERE ("oolk0"."salesteam" = 'Team1') AND ("oolk0"."categorycode" = 'Category 8')
ORDER BY "oolk0"."salesrep"
I find a few things about this strange. First of all, I'm never specifying that I want to select so many columns from the database, like I have to do with "Plain SQL". Yet, here they are. Secondly, this looks like only half of the "real query". I can't see any "SQL" that does a summation, so I assume this is happening internally in .NET objects and not on the database.
For indexes, I am building a data warehouse with a star schema. So my join field/primary key on the ooorderlines table is indexed (single column) and every column on my oolookup table has an index on it (single column). So I don't suspect an indexing issue at play. I am chalking this up to my inexperience with query expressions in .NET.
Where is the difference in six seconds coming from?

Entity Framework LINQ for finding sub items from LastOrDefault parent

I have few related objects and relation is like
public class Project
{
public List<ProjectEdition> editions;
}
public class ProjectEdition
{
public List<EditionItem> items;
}
public class EditionItem
{
}
I wanted to fetch the EditionItems from Last entries of ProjectEditions only for each Project
Example
Project#1 -> Edition#1 [contains few edition items ] , Edition#2 [contains few edition items]
Project#2 -> Edition#1 ,Edition#2 and Edition#3
My required output contains EditionItems from Edition#2 of Project#1 and Edition#3 of Project#2 only . I mean EditionItems from latest edition of a Project or last edition of a Project only
To get this i tried this query
List<EditionItem> master_list = context.Projects.Select(x => x.ProjectEditions.LastOrDefault())
.SelectMany(x => x.EditionItems).ToList();
But its returns error at LatsOrDefault() section
An exception of type 'System.NotSupportedException' occurred in EntityFramework.SqlServer.dll but was not handled in user code
Additional information: LINQ to Entities does not recognize the method '---------.Models.ProjectEdition LastOrDefault[ProjectEdition](System.Collections.Generic.IEnumerable`1
so how can i filter for last edition of a project and then get the list of EditionItems from it in a single LINQ call
Granit got the answer right, so I won't repeat his code. I would like to add the reasons for this behaviour.
Entity Framework is magic (sometimes too much magic) but it yet translates your LINQ queries into SQL and there are limitations to that of what your underlying database can do (SQL Server in this case).
When you call context.Projects.FirstOrDefault() it is translated into something like Select TOP 1 * from Projects. Note the TOP 1 part - this is SQL Server operator that limits number of rows returned. This is part of query optimisation in SQL Server. SQL Server does not have any operators that will give you LAST 1 - because it needs to run the query, return all the results, take the last one and dump the rest - this is not very efficient, think of a table with a couple (bi)million records.
So you need to apply whatever required sort order to your query and limit number of rows you return. If you need last record from the query - apply reverse sort order. You do need to sort because SQL Server does not guarantee order of records returned if no Order By is applied to the query - this is due to the way the data is stored internally.
When you write LINQ queries with EF I do recommend keep an eye on what SQL is generated by your queries - sometimes you'll see how complex they come out and you can easily simplify the query. And sometimes with lazy-loading enabled you introduce N+1 problem with a stroke of a key (literally). I use ExpressProfiler to watch generated SQL, LinqPad can also show you the SQL queries and there are other tools.
You cannot use method LastOrDefault() or Last() as discussed here.
Insetad, you can use OrderByDescending() in conjunction with FirstOrDefault() but first you need to have a property in you ProjectEdition with which you want to order the entities. E.g. if ProjectEdition has a property Id (which there is a good chance it does), you can use the following LINQ query:
List<EditionItem> master_list = context.Projects.Select(
x => x.ProjectEditions
.OrderByDescending(pe => pe.Id)
.FirstOrDefault())
.SelectMany(x => x.EditionItems).ToList();
List<EditionItem> master_list = context.Projects
.Select(p => p.editions.LastOrDefault())
.SelectMany(pe => pe.items).ToList();
IF LastOrDefault not supported you can try using OrderByDescending
List<EditionItem> master_list = context.Projects
.Select(p => p.editions.OrderByDescending(e => e.somefield).FirstOrDefault())
.SelectMany(pe => pe.items).ToList();
from p in context.project
from e in p.projectEdition.LastOrDefault()
select new EditionItem
{
item1 = e.item1
}
Please try this

Update Asp.net app to Npgsql 3 and removing Preload Reader

I have updated my ASP.NET app from NpgSQL 2.2.5 to 3.0.1. In the breaking changes it's specified that they have removed the Preload Reader support. So I remove it from the string connection.
Testing my web app, I got the error "An operation is already in progress." specially in the linq query like this:
var plugins =
from p in _pluginRepository.GetPlugins() // this method return this: GetAll().OrderBy(p => p.Created)
join e in _userPluginRepository.GetByUserId(user.Id).ToList() on p.Id equals e.Plugin.Id into pe
from e in pe.DefaultIfEmpty()
select new PluginViewModel
{
Active = e != null,
Name = p.Translations.ToUserLanguage(loggedInUser),
Key = p.Key,
PluginId = p.Id,
SettingId = e == null ? 0 : e.Id,
ExpireDate = e != null && e.ExpireDate.HasValue ? e.ExpireDate.Value : (DateTime?) null,
Grants = e == null ? UserPluginGrants.None.GetHashCode().ToString() : e.Grants.GetHashCode().ToString()
};
To solve this error, I have to append a ToList after the GetPlugins method.
Is this the correct behavior to use without Preload Reader? Why?
In Npgsql 2.x, using the Preload Reader made Npgsql pull the entire result set of the query from the database into the memory of your application. This freed the connection and allowed another command to be executed while still traversing the resultset of the first query. In other words, it allowed you to program as if you could execute multiple queries concurrently (known sometimes as MARS), although behind the scenes this was implemented inefficiently.
Adding a ToList() does exactly the same thing - pull everything into client memory, only it happens in your application code instead of in the database driver. So it's definitely an acceptable way to port your application from Npgsql 2.x to 3.x.
Now, if the result set being pulled (in this case GetPlugins) is small, this is a perfectly valid approach. If it's big, however you should look into alternatives. In your example, the join could be sent to the database, making your Linq expression translate into a single SQL query and eliminating the need for multiple queries (ORMs such as Entity Framework can usually do this for you). A more extreme solution would be to use multiple database connections, but that is heavier and also problematic if you're using transactions.
Note that there's an issue open for implementing true MARS in Npgsql (although it isn't likely to be implemented very soon): https://github.com/npgsql/npgsql/issues/462

How do I update multiple Entity models in one SQL statement?

I had the following:
List<Message> unreadMessages = this.context.Messages
.Where( x =>
x.AncestorMessage.MessageID == ancestorMessageID &&
x.Read == false &&
x.SentTo.Id == userID ).ToList();
foreach(var unreadMessage in unreadMessages)
{
unreadMessage.Read = true;
}
this.context.SaveChanges();
But there must be a way of doing this without having to do 2 SQL queries, one for selecting the items, and one for updating the list.
How do i do this?
Current idiomatic support in EF
As far as I know, there is no direct support for "bulk updates" yet in Entity Framework (there has been an ongoing discussion for bulk operation support for a while though, and it is likely it will be included at some point).
(Why) Do you want to do this?
It is clear that this is an operation that, in native SQL, can be achieved in a single statement, and provides some significant advantages over the approach followed in your question. Using the single SQL statement, only a very small amount of I/O is required between client and DB server, and the statement itself can be completely executed and optimized by the DB server. No need to transfer to and iterate through a potentially large result set client side, just to update one or two fields and send this back the other way.
How
So although not directly supported by EF, it is still possible to do this, using one of two approaches.
Option A. Handcode your SQL update statement
This is a very simple approach, that does not require any other tools/packages and can be performed Async as well:
var sql = "UPDATE TABLE x SET FIELDA = #fieldA WHERE FIELDB = #fieldb";
var parameters = new SqlParameter[] { ..., ... };
int result = db.Database.ExecuteSqlCommand(sql, parameters);
or
int result = await db.Database.ExecuteSqlCommandAsync(sql, parameters);
The obvious downside is, well breaking the nice linqy paradigm and having to handcode your SQL (possibly for more than one target SQL dialect).
Option B. Use one of the EF extension/utility packages
Since a while, a number of open source nuget packages are available that offer specific extensions to EF. A number of them do provide a nice "linqy" way to issue a single update SQL statement to the server. Two examples are:
Entity Framework Extended Library that allows performing a bulk update using a statement like:
context.Messages.Update(
x => x.Read == false && x.SentTo.Id == userID,
x => new Message { Read = true });
It is also available on github
EntityFramework.Utilities that allows performing a bulk update using a statement like:
EFBatchOperation
.For(context, context.Messages)
.Where(x => x.Read == false && x.SentTo.Id == userID)
.Update(x => x.Read, x => x.Read = true);
It is also available on github
And there are definitely other packages and libraries out there that provide similar support.
Even SQL has to do this in two steps in a sense, in that an UPDATE query with a WHERE clause first runs the equivalent of a SELECT behind the scenes, filtering via the WHERE clause, then applying the update. So really, I don't think you need to be worried about improving this.
Further, the reason why it's broken into two steps like this in LINQ is precisely for performance reasons. You want that "select" to be as minimal as possible, i.e. you don't want to load any more objects from the database into in memory objects than you have to. Only then do you alter objects (in the foreach).
If you really want to run a native UPDATE on the SQL side, you could use a System.Data.SqlClient.SqlCommand to issue the update, instead of having LINQ give you back objects that you then update. That will be faster, but then you conceptually move some of your logic out of your C# code object model space into the database model space (you are doing things in the database, not in your object space), even if the SqlCommand is being issued from your code.

Categories