Linq 2 Entities: Subquery based on query result

Linq 2 Entities: Subquery based on query result - c#

I moved from Linq 2 SQL to Linq 2 Entities due to performance reasons for complex queries.
It's a little bit frustrating to get everything working.
In L2SQL I can run subqueries without any problems.
Now I figured out, that in L2E I have to place the subquery outside the query.
That's usually no problem, but how can I run a query if the subquery depends on the query?
For example:
I want to get the status of an document but for that I need the current documentID.
var query = from Dok in dbContext.Dokumente
orderby Dok.DokumentID descending
select new ClassDok() {
_Nr = Dok.Dokumentnr,
_StatusID = dbContext.Database.SqlQuery<int>("GetDokStatusID", Dok.DokumentID).Single()
};
I already tried to put
dbContext.Database.SqlQuery<int>("GetDokStatusID", Dok.DokumentID).Single()
into a method and return integer, but it doesn't work.

Related

SQL generated from LINQ not consistent

I am using Telerik Open/Data Access ORM against an ORACLE.
Why do these two statements result in different SQL commands?
Statement #1
IQueryable<WITransmits> query = from wiTransmits in uow.DbContext.StatusMessages
select wiTransmits;
query = query.Where(e=>e.MessageID == id);
Results in the following SQL
SELECT
a."MESSAGE_ID" COL1,
-- additional fields
FROM "XFE_REP"."WI_TRANSMITS" a
WHERE
a."MESSAGE_ID" = :p0
Statement #2
IQueryable<WITransmits> query = from wiTransmits in uow.DbContext.StatusMessages
select new WITransmits
{
MessageID = wiTranmits.MessageID,
Name = wiTransmits.Name
};
query = query.Where(e=>e.MessageID == id);
Results in the following SQL
SELECT
a."MESSAGE_ID" COL1,
-- additional fields
FROM "XFE_REP"."WI_TRANSMITS" a
The query generated with the second statement #2 returns, obviously EVERY record in the table when I only want the one. Millions of records make this prohibitive.

Telerik Data Access will try to split each query into database-side and client-side (or in-memory LINQ if you prefer it).
Having projection with select new is sure trigger that will make everything in your LINQ expression tree after the projection to go to the client side.
Meaning in your second case you have inefficient LINQ query as any filtering is applied in-memory and you have already transported a lot of unnecessary data.
If you want compose LINQ expressions in the way done in case 2, you can append the Select clause last or explicitly convert the result to IEnumerable<T> to make it obvious that any further processing will be done in-memory.

The first query returns the full object defined, so any additional limitations (like Where) can be appended to it before it is actually being run. Therefore the query can be combined as you showed.
The second one returns a new object, which can be whatever type and contain whatever information. Therefore the query is sent to the database as "return everything" and after the objects have been created all but the ones that match the Where clause are discarded.
Even though the type were the same in both of them, think of this situation:
var query = from wiTransmits in uow.DbContext.StatusMessages
select new WITransmits
{
MessageID = wiTranmits.MessageID * 4 - 2,
Name = wiTransmits.Name
};
How would you combine the Where query now? Sure, you could go through the code inside the new object creation and try to move it outside, but since there can be anything it is not feasible. What if the checkup is some lookup function? What if it's not deterministic?
Therefore if you create new objects based on the database objects there will be a border where the objects will be retrieved and then further queries will be done in memory.

Complex Linq-To-Entities query with deferred execution: prevent OrderBy being used as a subquery/projection

I built a dynamic LINQ-to-Entities query to support optional search parameters. It was quite a bit of work to get this producing performant SQL and I am NEARLY there, but I stumble across a big issue with OrderBy which gets translated into kind of a projection / subquery containing the actual query, causing extremely inperformant SQL. I can't find a solution to get this right. Maybe someone can help me out :)
I spare you the complete query for now as it is long and complex, I translate it into a simple sample for better understanding:
I'm doing something like this:
// Start with the base query
var query = from a in db.Articles
where a.UserId = 1;
// Apply some optional conditions
if (tagParam != null)
query = query.Where(a => a.Tag = tagParam);
if (authorParam != null)
query = query.Where(a => a.Author = authorParam);
// ... and so on ...
// I only want the 50 most recent articles, so I finally want to apply Take and OrderBy
query = query.OrderByDescending(a => a.Published);
query = query.Take(50);
The resulting SQL strangely translates the OrderBy in an container query:
select top 50 Id, Published, Title, Content
from (select Id, Published, Title Content
from Articles
where UserId = 1
and Author = #paramAuthor)
order by Published desc
Note that also the Top 50 got moved to the outer query. In case I would only use Take(50), the top 50 sql statement would correctly be applied to the inner query above (the outer query wouldn't even exist). Only when I use OrderBy, Linq-To-Entities uses this container query approach.
This causes a very bad execution plan where the inner query takes all articles that apply to the parameters from Disk and pass them to the outer query - and only there, OrderBy and Top is processed. In my case, this can be hundred thousands of lines. I already tried to move the order by manually into the inner statement and execute this - this produces much better results as the existing indexes allow the SQL Server to easily find the top 50 rows in right order without reading all rows from disk.
Is there any way I can get EF to append the order by clause to the inner query? Or any other trick to get this working right?
Any help would be greatly appreciated :)
Edit: As an additional information, some tests with less complex queries showed that the Optimizer normally handles such subquery scenarios well. In my scenario, the Optimizer fails on this unfortunately and moves hundrets of thousands of rows through the query plan. But moving the OrderBy to the inner query solves it and the Optimizer does it right.
Edit 2: After couple of hours of more testing it seems the issue with the wrong execution plan is a SQL Server issue that is not caused by the created container query. While the move of the order by and top clause into the inside query did fix the issue initially, I can't reproduce this now anymore, SQL Server started using the bad execution plan now also here (while the data in the DB remained unchanged). The move of the order by clause might caused SQL Server to take other statistics into account but it seems it was not due to the better/more clean query design. However, I still want to know why EF uses a container query here and if I can influence this behavior. If it will not improve performance, at least it would make debugging easier if the generated EF queries are more straightforward and not that convoluted.

LINQ Select Statement making many SQL Calls

I am trying to run a Select on an IQUeryable linked to a database query. Its working correctly, but for all the properties being selecte-d, its running a seperate query.
My code looks something like this
IQueryable<MyDataSource> data = [Some Complicated Query I've been Building Up];
var results = data.Select(d => new
{
A = d.A,
B = d.B,
C = d.C
}).Take(100).ToArray();
Now, this is taking ages, even though the actual Query isn't taking that long.
When I ran an SQL profiler on it, I'm finding out that its running a different SQL select procedure for each property I'm selecting - for each entity I'm returning (so in the above example around 300 different queries, as well as the actual first query to perform the filtering).
I'm quite sure I'm doing something wrong here, what is it? I'm expecting it to run a single large query - which selects the right columns from the datasource (You know Select top 100 d.A, d.B, d.C from [bla bla] ), not all this mess.

You can influence the lazy/eager loading using DataLoadOptions
This will force eager loading for B
DataLoadOptions options = new DataLoadOptions();
options.LoadWith<a>(a => a.B);
dc.LoadOptions = options;

Are different IQueryable objects combined?

I have a little program that needs to do some calculation on a data range. The range maybe contain about half a millon of records. I just looked to my db and saw that a group by was executed.
I thought that the result was executed on the first line, and later I just worked with data in RAM. But now I think that the query builder combine the expression.
var Test = db.Test.Where(x => x > Date.Now.AddDays(-7));
var Test2 = (from p in Test
group p by p.CustomerId into g
select new { UniqueCount = g.Count() } );
In my real world app I got more subqueries that is based on the range selected by the first query. I think I just added a big overhead to let the DB make different selects.
Now I bascilly just call .ToList() after the first expression.
So my question is am I right about that the query builder combine different IQueryable when it builds the expression tree?

Yes, you are correct. LINQ expressions are lazily evaluated at the moment you evaluate them (via .ToList(), for example). At that point in time, Entity Framework will look at the total query and build an SQL statement to represent it.
In this particular case, it's probably wiser to not evaluate the first query, because the SQL database is optimized for performing set-based operations like grouping and counting. Rather than forcing the database to send all the Test objects across the wire, deserializing the results into in-memory objects, and then performing the grouping and counting locally, you will likely see better performance by having the SQL database just return the resulting Counts.

Joins and subqueries in LINQ

I am trying to do a join with a sub query and can't seem to get it. Here is what is looks like working in sql. How do I get to to work in linq?
SELECT po.*, p.PermissionID
FROM PermissibleObjects po
INNER JOIN PermissibleObjects_Permissions po_p ON (po.PermissibleObjectID = po_p.PermissibleObjectID)
INNER JOIN Permissions p ON (po_p.PermissionID = p.PermissionID)
LEFT OUTER JOIN
(
SELECT u_po.PermissionID, u_po.PermissibleObjectID
FROM Users_PermissibleObjects u_po
WHERE u_po.UserID = '2F160457-7355-4B59-861F-9871A45FD166'
) used ON (p.PermissionID = used.PermissionID AND po.PermissibleObjectID = used.PermissibleObjectID)
WHERE used.PermissionID is null

Without seeing your database and data model, it's pretty impossible to offer any real help. But, probably the best way to go is:
download linqpad - http://www.linqpad.net/
create a connection to your database
start with the innermost piece - the subquery with the "where" clause
get each small query working, then join them up. Linqpad will show you the generated SQL, as well as the results, so build your small queries up until they are right
So, basically, split your problem up into smaller pieces. Linqpad is fantastic as it lets you test these things out, and check your results as you go
hope this helps, good luck
Toby

The LINQ translation for your query is suprisingly simple:
from pop in PermissibleObjectPermissions
where !pop.UserPermissibleObjects.Any (
upo => upo.UserID == new Guid ("2F160457-7355-4B59-861F-9871A45FD166"))
select new { pop.PermissibleObject, pop.PermissionID }
In words: "From all object permissions, retrieve those with at least one user-permission whose UserID is 2F160457-7355-4B59-861F-9871A45FD16".
You'll notice that this query uses association properties for navigating relationships - this avoids the need for "joining" and simplfies the query. As a result, the LINQ query is much closer to its description in English than the original SQL query.
The trick, when writing LINQ queries, is to get out of the habit of "transliterating" SQL into LINQ.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.