Entity Framework SQL Query Execution - c#

Using the Entity Framework, when one executes a query on lets say 2000 records requiring a groupby and some other calculations, does the query get executed on the server and only the results sent over to the client or is it all sent over to the client and then executed?
This using SQL Server.
I'm looking into this, as I'm going to be starting a project where there will be loads of queries required on a huge database and want to know if this will produce a significant load on the network, if using the Entity Framework.

I would think all database querying is done on the server side (where the database is!) and the results are passed over. However, in Linq you have what's known as Delayed Execution (lazily loaded) so your information isn't actually retrieved until you try to access it e.g. calling ToList() or accessing a property (related table).
You have the option to use the LoadWith to do eager loading if you require it.
So in terms of performance if you only really want to make 1 trip to the Database for your query (which has related tables) I would advise using the LoadWith options. However, it does really depend on the particular situation.

It's always executed on SQL Server. This also means sometimes you have to change this:
from q in ctx.Bar
where q.Id == new Guid(someString)
select q
to
Guid g = new Guid(someString);
from q in ctx.Bar
where q.Id == g
select q
This is because the constructor call cannot be translated to SQL.

Sql's groupby and linq's groupby return differently shaped results.
Sql's groupby returns keys and aggregates (no group members)
Linq's groupby returns keys and group members.
If you use those group members, they must be (re-)fetched by the grouping key. This can result in +1 database roundtrip per group.

well, i had the same question some time ago.
basically: your linq-statement is converted to a sql-statement. however: some groups will get translated, others not - depending on how you write your statement.
so yes - both is possible
example:
var a = (from entity in myTable where entity.Property == 1 select entity).ToList();
versus
var a = (from entity in myTable.ToList() where entity.Property == 1 select entity).ToList();

Related

Compare local list to DataBase

I have a local List with entities, some hundreds, and I have a SQL Server table where I store the ID of the successful processed entities, some millions. I would like to know, which entities form my local set are not yet processed i.e. are not in the SQL Table.
The first approach is to iterate through the local list with the following Linq statement:
Entity entity = db.Entities.FirstOrDefault(m => m.ID == ID);
if (entity == null) { NewList.Add(ID) }
the NewList would then contain all the new entities. However this is very slow.
In LINQ, how would you send the entire local list to the SQL Server with one call and then return the ones not in the SQL table?
Do you really have to create a temporary table with my local list, then left-join on the already processed table and return the ones with a null?
Use .Contains method to retrieve already processed ids
and Except to create list of not yet processed ids.
var localList = new List<int> { 1, 2, 3 };
var processed = db.Entities
.Where(entity => localList.Contains(entity.Id))
.Select(entity => entity.Id)
.ToList();
var notProcessed = localList.Except(processed).ToList();
It will depend on provider, but .Contains should generate sql like:
SELECT Id FROM Entity WHERE Id IN (1, 2, 3)
suggestion:
create a temp table and insert your IDs
select your result on the SQL side
EDIT:
"Can you do that in LINQ?"
TL;DR:
yes* but that's an ugly piece of work, write the SQL yourself
*)depends on what you mean with "in" LINQ, because that is not in the scope of LINQ. In other words: a LINQ expression is one layer too abstract, but if you happen to have an LINQ accessible implementation for this, you can use this in your LINQ statements
on the LINQ expression side you have something like:
List<int> lst = new List<int>() { 1,2,3 };
List<int> result = someQueryable.Where(x=>lst.Contains(x.ID)).Select(x=>x.ID).ToList();
the question now is: what happens on the SQL side (assuming the queryable leads us to a SQL database)?
the queryable provider (e.g. Entity Framework) somehow has to translate that into SQL, execute it and come back with the result
here would be the place to modify the translation...
for example examine the expression tree with regard to the object that is the target for the Contains(...) call and if it is more than just a few elements, go for the temp table approach...
the very same LINQ expression can be translated into different SQL commands. The provider decides how the translation has to be done.
if your provider lacks support for large Contains(...) cases, you will probably experience poor performance... good thing is usually nobody forces you to use it this way ... you can skip linq for performance optimized queries, or you could write a provider extension yourself but then you are not on the "doing something with LINQ"-side but extending the functionality of your LINQ provider
if you are not developing a large scalable product that will be deployed to work with different DB-Backends, it is usually not worth the effort... the easier way to go is to write the sql yourself and just use the raw sql option of your db connection

Entity Framework LINQ for finding sub items from LastOrDefault parent

I have few related objects and relation is like
public class Project
{
public List<ProjectEdition> editions;
}
public class ProjectEdition
{
public List<EditionItem> items;
}
public class EditionItem
{
}
I wanted to fetch the EditionItems from Last entries of ProjectEditions only for each Project
Example
Project#1 -> Edition#1 [contains few edition items ] , Edition#2 [contains few edition items]
Project#2 -> Edition#1 ,Edition#2 and Edition#3
My required output contains EditionItems from Edition#2 of Project#1 and Edition#3 of Project#2 only . I mean EditionItems from latest edition of a Project or last edition of a Project only
To get this i tried this query
List<EditionItem> master_list = context.Projects.Select(x => x.ProjectEditions.LastOrDefault())
.SelectMany(x => x.EditionItems).ToList();
But its returns error at LatsOrDefault() section
An exception of type 'System.NotSupportedException' occurred in EntityFramework.SqlServer.dll but was not handled in user code
Additional information: LINQ to Entities does not recognize the method '---------.Models.ProjectEdition LastOrDefault[ProjectEdition](System.Collections.Generic.IEnumerable`1
so how can i filter for last edition of a project and then get the list of EditionItems from it in a single LINQ call
Granit got the answer right, so I won't repeat his code. I would like to add the reasons for this behaviour.
Entity Framework is magic (sometimes too much magic) but it yet translates your LINQ queries into SQL and there are limitations to that of what your underlying database can do (SQL Server in this case).
When you call context.Projects.FirstOrDefault() it is translated into something like Select TOP 1 * from Projects. Note the TOP 1 part - this is SQL Server operator that limits number of rows returned. This is part of query optimisation in SQL Server. SQL Server does not have any operators that will give you LAST 1 - because it needs to run the query, return all the results, take the last one and dump the rest - this is not very efficient, think of a table with a couple (bi)million records.
So you need to apply whatever required sort order to your query and limit number of rows you return. If you need last record from the query - apply reverse sort order. You do need to sort because SQL Server does not guarantee order of records returned if no Order By is applied to the query - this is due to the way the data is stored internally.
When you write LINQ queries with EF I do recommend keep an eye on what SQL is generated by your queries - sometimes you'll see how complex they come out and you can easily simplify the query. And sometimes with lazy-loading enabled you introduce N+1 problem with a stroke of a key (literally). I use ExpressProfiler to watch generated SQL, LinqPad can also show you the SQL queries and there are other tools.
You cannot use method LastOrDefault() or Last() as discussed here.
Insetad, you can use OrderByDescending() in conjunction with FirstOrDefault() but first you need to have a property in you ProjectEdition with which you want to order the entities. E.g. if ProjectEdition has a property Id (which there is a good chance it does), you can use the following LINQ query:
List<EditionItem> master_list = context.Projects.Select(
x => x.ProjectEditions
.OrderByDescending(pe => pe.Id)
.FirstOrDefault())
.SelectMany(x => x.EditionItems).ToList();
List<EditionItem> master_list = context.Projects
.Select(p => p.editions.LastOrDefault())
.SelectMany(pe => pe.items).ToList();
IF LastOrDefault not supported you can try using OrderByDescending
List<EditionItem> master_list = context.Projects
.Select(p => p.editions.OrderByDescending(e => e.somefield).FirstOrDefault())
.SelectMany(pe => pe.items).ToList();
from p in context.project
from e in p.projectEdition.LastOrDefault()
select new EditionItem
{
item1 = e.item1
}
Please try this

At what point do I need to off-load the work of a query to the DB?

I have a web app, and I'm connecting to a DB with entity framework.
To select all Employee records of a particular department, for example, I can quite easily write:
....Employees.Where(o => o.Department == "HR").ToList();
Works fine. But is it most optimal?
Should this Where clause be incorporated into a stored procedure or view? Or does my entity framework code do the job of converting it to SQL anyway?
We've had performance problems in our team in the past from when people pull records into memory and then do the filtering in .net instead of at a database level. I'm trying to avoid this happening again so want to be crystal clear on what I must avoid.
If Employees is provided by Entity Framework then the Where() will be translated into SQL and sent to the database. It is only the point that you materialise the objects does it take the filters you have applied before and turn them into SQL. Anything after that point is just plain LINQ to objects.
Methods that cause materialisation to happen include things like .ToList() and .ToArray() (there are more, but these two are probably the most common).
If you want to see what is happening on SQL Server, you should open up the SQL Profiler and have a look at the queries that are being sent.
We've had performance problems in our team in the past from when people pull records into memory and then do the filtering in .net instead of at a database level.
Just as an addendum to Colin's answer, and to target the quote above, the way to avoid this is to make sure your database queries are fully constructed with IQueryable<T> first, before enumerating the results with a call such as .ToList(), or .ToArray().
As an example, consider the following:
IEnumerable<Employee> employees = context.Employees;
// other code, before executing the following
var hrEmployees = employees.Where(o => o.Department == "HR").ToList();
The .ToList() will enumerate the results grabbing all of the employees from the context first, and then performing the filtering on the client. This isn't going to perform very well if you've got a lot of employees to contend with, and it's certainly not going to scale very well.
Compare that with this:
IQueryable<Employee> employees = context.Employees;
// other code, before executing the following
var hrEmployees = employees.Where(o => o.Department == "HR").ToList();
IQueryable<T> derives from IEnumerable<T>. The difference between them is that IQueryable<T> has a query provider built in, and the query you construct is represented as an expression tree. That means it's not evaluated until a call that enumerates the results of the query, such as .ToList().
In the second example above, the query provider will execute SQL to fetch only those employees that belong to the HR department, performing the filtering on the database itself.

LINQ EF Join query from 2 different data context

I am using LINQ to retrieve data from my EF context as well as from Asp .Net Identity 2.0 - both located in the same MS SQL Server database.
My problem is that LINQ sees them as 2 different cases of data context and is unable to process the query.
"The specified LINQ expression contains references to queries that are associated with different contexts."
What I want to achieve is a simple return of 10 top items (I skip this in the code extract) from EF table, previously sorted by the UserName from ASP .NET Identity table.
I have seen a few cases of this problem on StackOverflow but I was unable to apply any solution in my case.
The preferred solution would be of course not to download all of the table data and do the sorting on the server.
The query in question:
var dbDataSorted = from entry in dbData
join user in this.UserManager.Users
on entry.UserId equals new Guid(user.Id)
orderby user.UserName ascending
select entry;
return dbDataSorted;
I was able to get this to work in my case by using AsEnumerable(). YMMV.
In your case:
var dbDataSorted = from entry in dbData.AsEnumerable()
join user in this.UserManager.Users
on entry.UserId equals new Guid(user.Id)
orderby user.UserName ascending
select entry;
return dbDataSorted;
LINQ and EF are pretty cool. But sometimes, its abstractions don't offer what you need.
Just fall back to base, write the query by hand, put it in a string, run it against yourcontext.YourDbSet with the SqlQuery method, and be done with it.
var query = #"SELECT * FROM dbData as entry
INNER JOIN Users
ON entry.UserId = Users.Id
ORDER BY Users.Username";
yourcontext.dbData.SqlQuery(query);
If the abstractions offered to you don't work with what you need, abusing the abstractions to do something weird is far less clear than using the lower level interface.
You can't use Linq to Entities and Linq to object in one query. It's because when you make query to instance of IQueryable interface, the query will be transformed to SQL, and in SQL you can't use any IEnumerable collection. So you should get data from database (for example dbData.AsEnumerable()).
But if you want do all job on the SQL Server side, the easiest way is to create sql procedure and pass the users table as a parameter. The easiest way to pass users table as xml and than parse it in the procedure on the server side.
If you're using ASP.NET Identity 2 code-first, then your DbContext presumably inherits from IdentityDbContext<>, and so you can access the Users table directly from the context. You don't need to use UserManager to access the users.

Select clause containing non-EF method calls

I'm having trouble building an Entity Framework LINQ query whose select clause contains method calls to non-EF objects.
The code below is part of an app used to transform data from one DBMS into a different schema on another DBMS. In the code below, Role is my custom class unrelated to the DBMS, and the other classes are all generated by Entity Framework from my DB schema:
// set up ObjectContext's for Old and new DB schemas
var New = new NewModel.NewEntities();
var Old = new OldModel.OldEntities();
// cache all Role names and IDs in the new-schema roles table into a dictionary
var newRoles = New.roles.ToDictionary(row => row.rolename, row => row.roleid);
// create a list or Role objects where Name is name in the old DB, while
// ID is the ID corresponding to that name in the new DB
var roles = from rl in Old.userrolelinks
join r in Old.roles on rl.RoleID equals r.RoleID
where rl.UserID == userId
select new Role { Name = r.RoleName, ID = newRoles[r.RoleName] };
var list = roles.ToList();
But calling ToList gives me this NotSupportedException:
LINQ to Entities does not recognize
the method 'Int32
get_Item(System.String)' method, and
this method cannot be translated into
a store expression
Sounds like LINQ-to-Entities is barfing on my call to pull the value out of the dictionary given the name as a key. I admittedly don't understand enough about EF to know why this is a problem.
I'm using devart's dotConnect for PostgreSQL entity framework provider, although I assume at this point that this is not a DBMS-specific issue.
I know I can make it work by splitting up my query into two queries, like this:
var roles = from rl in Old.userrolelinks
join r in Old.roles on rl.RoleID equals r.RoleID
where rl.UserID == userId
select r;
var roles2 = from r in roles.AsEnumerable()
select new Role { Name = r.RoleName, ID = newRoles[r.RoleName] };
var list = roles2.ToList();
But I was wondering if there was a more elegant and/or more efficient way to solve this problem, ideally without splitting it in two queries.
Anyway, my question is two parts:
First, can I transform this LINQ query into something that Entity Framework will accept, ideally without splitting into two pieces?
Second, I'd also love to understand a little about EF so I can understand why EF can't layer my custom .NET code on top of the DB access. My DBMS has no idea how to call a method on a Dictionary class, but why can't EF simply make those Dictionary method calls after it's already pulled data from the DB? Sure, if I wanted to compose multiple EF queries together and put custom .NET code in the middle, I'd expect that to fail, but in this case the .NET code is only at the end, so why is this a problem for EF? I assume the answer is something like "that feature didn't make it into EF 1.0" but I am looking for a bit more explanation about why this is hard enough to justify leaving it out of EF 1.0.
The problem is that in using Linq's delayed execution, you really have to decide where you want the processing and what data you want to traverse the pipe to your client application. In the first instance, Linq resolves the expression and pulls all of the role data as a precursor to
New.roles.ToDictionary(row => row.rolename, row => row.roleid);
At that point, the data moves from the DB into the client and is transformed into your dictionary. So far, so good.
The problem is that your second Linq expression is asking Linq to do the transform on the second DB using the dictionary on the DB to do so. In other words, it is trying to figure out a way to pass the entire dictionary structure to the DB so that it can select the correct ID value as part of the delayed execution of the query. I suspect that it would resolve just fine if you altered the second half to
var roles = from rl in Old.userrolelinks
join r in Old.roles on rl.RoleID equals r.RoleID
where rl.UserID == userId
select r.RoleName;
var list = roles.ToDictionary(roleName => roleName, newRoles[roleName]);
That way, it resolves your select on the DB (selecting just the rolename) as a precursor to processing the ToDictionary call (which it should do on the client as you'd expect). This is essentially exactly what you are doing in your second example because AsEnumerable is pulling the data to the client before using it in the ToList call. You could as easily change it to something like
var roles = from rl in Old.userrolelinks
join r in Old.roles on rl.RoleID equals r.RoleID
where rl.UserID == userId
select r;
var list = roles.AsEnumerable().Select(r => new Role { Name = r.RoleName, ID = newRoles[r.RoleName] });
and it'd work out the same. The call to AsEnumerable() resolves the query, pulling the data to the client for use in the Select that follows it.
Note that I haven't tested this, but as far as I understand Entity Framework, that's my best explanation for what's going on under the hood.
Jacob is totally right.
You can not transform the desired query without splitting it in two parts, because Entity Framework is unable to translate the get_Item call into the SQL query.
The only way is to write the LINQ to Entities query and then write a LINQ to Objects query to its result, just as Jacob advised.
The problem is Entity-Framework-specific one, it does not arise from our implementation of the Entity Framework support.

Categories