Joins and subqueries in LINQ

Joins and subqueries in LINQ - c#

I am trying to do a join with a sub query and can't seem to get it. Here is what is looks like working in sql. How do I get to to work in linq?
SELECT po.*, p.PermissionID
FROM PermissibleObjects po
INNER JOIN PermissibleObjects_Permissions po_p ON (po.PermissibleObjectID = po_p.PermissibleObjectID)
INNER JOIN Permissions p ON (po_p.PermissionID = p.PermissionID)
LEFT OUTER JOIN
(
SELECT u_po.PermissionID, u_po.PermissibleObjectID
FROM Users_PermissibleObjects u_po
WHERE u_po.UserID = '2F160457-7355-4B59-861F-9871A45FD166'
) used ON (p.PermissionID = used.PermissionID AND po.PermissibleObjectID = used.PermissibleObjectID)
WHERE used.PermissionID is null

Without seeing your database and data model, it's pretty impossible to offer any real help. But, probably the best way to go is:
download linqpad - http://www.linqpad.net/
create a connection to your database
start with the innermost piece - the subquery with the "where" clause
get each small query working, then join them up. Linqpad will show you the generated SQL, as well as the results, so build your small queries up until they are right
So, basically, split your problem up into smaller pieces. Linqpad is fantastic as it lets you test these things out, and check your results as you go
hope this helps, good luck
Toby

The LINQ translation for your query is suprisingly simple:
from pop in PermissibleObjectPermissions
where !pop.UserPermissibleObjects.Any (
upo => upo.UserID == new Guid ("2F160457-7355-4B59-861F-9871A45FD166"))
select new { pop.PermissibleObject, pop.PermissionID }
In words: "From all object permissions, retrieve those with at least one user-permission whose UserID is 2F160457-7355-4B59-861F-9871A45FD16".
You'll notice that this query uses association properties for navigating relationships - this avoids the need for "joining" and simplfies the query. As a result, the LINQ query is much closer to its description in English than the original SQL query.
The trick, when writing LINQ queries, is to get out of the habit of "transliterating" SQL into LINQ.

Related

Linq 2 Entities: Subquery based on query result

I moved from Linq 2 SQL to Linq 2 Entities due to performance reasons for complex queries.
It's a little bit frustrating to get everything working.
In L2SQL I can run subqueries without any problems.
Now I figured out, that in L2E I have to place the subquery outside the query.
That's usually no problem, but how can I run a query if the subquery depends on the query?
For example:
I want to get the status of an document but for that I need the current documentID.
var query = from Dok in dbContext.Dokumente
orderby Dok.DokumentID descending
select new ClassDok() {
_Nr = Dok.Dokumentnr,
_StatusID = dbContext.Database.SqlQuery<int>("GetDokStatusID", Dok.DokumentID).Single()
};
I already tried to put
dbContext.Database.SqlQuery<int>("GetDokStatusID", Dok.DokumentID).Single()
into a method and return integer, but it doesn't work.

Complex Linq-To-Entities query with deferred execution: prevent OrderBy being used as a subquery/projection

I built a dynamic LINQ-to-Entities query to support optional search parameters. It was quite a bit of work to get this producing performant SQL and I am NEARLY there, but I stumble across a big issue with OrderBy which gets translated into kind of a projection / subquery containing the actual query, causing extremely inperformant SQL. I can't find a solution to get this right. Maybe someone can help me out :)
I spare you the complete query for now as it is long and complex, I translate it into a simple sample for better understanding:
I'm doing something like this:
// Start with the base query
var query = from a in db.Articles
where a.UserId = 1;
// Apply some optional conditions
if (tagParam != null)
query = query.Where(a => a.Tag = tagParam);
if (authorParam != null)
query = query.Where(a => a.Author = authorParam);
// ... and so on ...
// I only want the 50 most recent articles, so I finally want to apply Take and OrderBy
query = query.OrderByDescending(a => a.Published);
query = query.Take(50);
The resulting SQL strangely translates the OrderBy in an container query:
select top 50 Id, Published, Title, Content
from (select Id, Published, Title Content
from Articles
where UserId = 1
and Author = #paramAuthor)
order by Published desc
Note that also the Top 50 got moved to the outer query. In case I would only use Take(50), the top 50 sql statement would correctly be applied to the inner query above (the outer query wouldn't even exist). Only when I use OrderBy, Linq-To-Entities uses this container query approach.
This causes a very bad execution plan where the inner query takes all articles that apply to the parameters from Disk and pass them to the outer query - and only there, OrderBy and Top is processed. In my case, this can be hundred thousands of lines. I already tried to move the order by manually into the inner statement and execute this - this produces much better results as the existing indexes allow the SQL Server to easily find the top 50 rows in right order without reading all rows from disk.
Is there any way I can get EF to append the order by clause to the inner query? Or any other trick to get this working right?
Any help would be greatly appreciated :)
Edit: As an additional information, some tests with less complex queries showed that the Optimizer normally handles such subquery scenarios well. In my scenario, the Optimizer fails on this unfortunately and moves hundrets of thousands of rows through the query plan. But moving the OrderBy to the inner query solves it and the Optimizer does it right.
Edit 2: After couple of hours of more testing it seems the issue with the wrong execution plan is a SQL Server issue that is not caused by the created container query. While the move of the order by and top clause into the inside query did fix the issue initially, I can't reproduce this now anymore, SQL Server started using the bad execution plan now also here (while the data in the DB remained unchanged). The move of the order by clause might caused SQL Server to take other statistics into account but it seems it was not due to the better/more clean query design. However, I still want to know why EF uses a container query here and if I can influence this behavior. If it will not improve performance, at least it would make debugging easier if the generated EF queries are more straightforward and not that convoluted.

DbContext times out on remote server only

I've got a Linq To Sql query (or with brackets) here that works on my local SQL2008, in about 00:00:00s - 00:00:01s, but on the remote server, it takes around 00:02:10s. There's about 56k items in dbo.Movies, dbo.Boxarts, and 300k in dbo.OmdbEntries
{SELECT
//pull distinct t_meter out of the created object
Distinct2.t_Meter AS t_Meter
//match all movie data on the same movie_id
FROM ( SELECT DISTINCT
Extent2.t_Meter AS t_Meter
FROM dbo.Movies AS Extent1
INNER JOIN dbo.OmdbEntries AS Extent2 ON Extent1.movie_ID = Extent2.movie_ID
INNER JOIN dbo.BoxArts AS Extent3 ON Extent1.movie_ID = Extent3.movie_ID
//pull the genres matched on movie_ids
INNER JOIN (SELECT DISTINCT
Extent4.movie_ID AS movie_ID
FROM dbo.MovieToGenres AS Extent4
//all genres matched on movie ids
INNER JOIN dbo.Genres AS Extent5 ON Extent4.genre_ID = Extent5.genre_ID ) AS Distinct1 ON Distinct1.movie_ID = Extent1.movie_ID
WHERE 1 = 1
//sort the t_meters by ascending
) AS Distinct2
ORDER BY Distinct2.t_Meter ASC}
The inner query first takes all the related items in the tables and then creates a new object, then from that object, find only the t_Meters that aren't null. Then from those t_Meters, select only the distinct items and then sort them, to return a list of 98 or so ints.
I don't know enough about SQL Databases yet or not to intuitively know whether or not that that's an extreme set of db calls to put into a single query, but since it only takes a second or less on my local server, I thought it was alright.
edit: Here's the LINQ code that I haven't really cleaned up at all: http://pastebin.com/JUkdjHDJ It's messy, but it gets the job done... The fix I found was calling ToArray after OrderBy, but before Distinct helped out immensely. So instead of
var results = IQueryableWithDBDatasTMeter.Distinct().OrderBy().ToArray()
I did
var orderedResults = IQueryableWithDBDatasTMeter.OrderBy().ToArray()
var distinctOrderedResults = orderedResults.Distinct().ToArray()
I'm sure had I linked the Linq code (and cleaned it up) rather than the autogenerated SQL query, you would have been able to solve this easily, sorry about that.

Here's the LINQ code that I haven't really cleaned up at all: http://pastebin.com/JUkdjHDJ It's messy, but it gets the job done... The fix I found was calling ToArray after OrderBy, but before Distinct helped out immensely. So instead of
var results = IQueryableWithDBDatasTMeter.Distinct().OrderBy().ToArray()
I did
var orderedResults = IQueryableWithDBDatasTMeter.OrderBy().ToArray()
var distinctOrderedResults = orderedResults.Distinct().ToArray()
I guess it works because it's running the Distinct only against the Array in memory, rather than the entire DB's worth of entries? I'm not really sure though, since the old LINQ works flawlessly on my local server.
I'm sure had I linked the Linq code (and cleaned it up) rather than the autogenerated SQL query, you would have been able to solve this easily, sorry about that.

Entity Framework and large queries. What's practical?

I'm from old school where DB had all data access encapsulated into views, procedures, etc. Now I'm forcing myself into using LINQ for most of the obvious queries.
What I'm wondering though, is when to stop and what practical? Today I needed to run query like this:
SELECT D.DeviceKey, D.DeviceId, DR.DriverId, TR.TruckId, LP.Description
FROM dbo.MBLDevice D
LEFT OUTER JOIN dbo.DSPDriver DR ON D.DeviceKey = DR.DeviceKey
LEFT OUTER JOIN dbo.DSPTruck TR ON D.DeviceKey = TR.DeviceKey
LEFT OUTER JOIN
(
SELECT LastPositions.DeviceKey, P.Description, P.Latitude, P.Longitude, P.Speed, P.DeviceTime
FROM dbo.MBLPosition P
INNER JOIN
(
SELECT D.DeviceKey, MAX(P.PositionKey) LastPositionKey
FROM dbo.MBLPosition P
INNER JOIN dbo.MBLDevice D ON P.DeviceKey = D.DeviceKey
GROUP BY D.DeviceKey
) LastPositions ON P.PositionKey = LastPositions.LastPositionKey
) LP ON D.DeviceKey = LP.DeviceKey
WHERE D.IsActive = 1
Personally, I'm not able to write corresponing LINQ. So, I found tool online and got back 2 page long LINQ. It works properly-I can see it in profiler but it's not maintainable IMO. Another problem is that I'm doing projection and getting Anonymous object back. Or, I can manually create class and project into that custom class.
At this point I wonder if it is better to create View on SQL Server and add it to my model? It will break my "all SQL on cliens side" mantra but will be easier to read and maintain. No?
I wonder where you stop with T-SQL vs LINQ ?
EDIT
Model description.
I have DSPTrucks, DSPDrivers and MBLDevices.
Device can be attached to Truck or to Driver or to both.
I also have MBLPositions which is basically pings from device (timestamp and GPS position)
What this query does - in one shot it returns all device-truck-driver information so I know what this device attached to and it also get's me last GPS position for those devices. Response may look like so:
There is some redundant stuff but it's OK. I need to get it in one query.

In general, I would also default to LINQ for most simple queries.
However, when you get at a point where the corresponding LINQ query becomes harder to write and maintain, then what's the point really? So I would simply leave that query in place. It works, after all. To make it easier to use it's pretty straight-forward to map a view or cough stored procedure in your EF model. Nothing wrong with that, really (IMO).

You can firstly store Linq queries in variables which may help to make it not only more readable, but also reusable.
An example maybe like the following:
var redCars = from c in cars
where c.Colour == "red"
select c;
var redSportsCars = from c in redCars
where c.Type == "Sports"
select c;
Queries are lazily executed and not composed until you compile them or iterate over them so you'll notice in profiler that this does produce an effecient query
You will also benifit from defining relationships in the model and using navigation properties, rather than using the linq join syntax. This (again) will make these relationships reusable between queries, and more readable (because you don't specify the relationships in the query like the SQL above)
Generally speaking your LINQ query will be shorter than the equivalent SQL, but I'd suggest trying to work it out by hand rather than using a conversion tool.
With the exception of CTEs (which I'm fairly sure you can't do in LINQ) I would write all queries in LINQ these days

I find when using LINQ its best to ignore whatever sql it generates as long as its retrieving the right thing and is performant, only when one of those doesn't work do I actually look at what its generating.
In terms of the sql it generates being maintainable, you shouldn't really worry about the SQL being maintainable but more the LINQ query that is generating the SQL.
In the end if the sql is not quite right I believe there are various things you can do to make LINQ generate SQL more along the lines you want..to some extent.
AFAIK there isn't any inherent problem with getting anonymous objects back, however if you are doing it it multiple places you may want to create a class to keep things neater.

linq sql - aggregate count

I have some linq that returns the correct data.
var numEmails = (from row in EmailBatchProposal
where row.EmailBatchId == emailBatchId
select row.EmailBatchProposalId).Count();
However, if I understand linq correctly, this does not perform optimally. It grabs all the data and then walks through the list and counts the rows. What I'd really like is for linq (in the background) to use like:
Select count(*) from ...
I trust the performance reasons are obvious.
Does anyone know the proper way to do this?

Actually, if the linq query is used with a collection that implements IQueryable and supports translation into underlying SQL variant, it is quite a basic functionality to translate the Count function from your example correctly.

People generally learn best by practicing. I would suggest you get a copy of LinqPad (free), enter in your Linq query, and see what SQL it generates. Then you can modify the Linq query until you get exactly what you want.

Actually, the LINQ-to-SQL is smart enough to know that it should do a count... For example, if I have the following query:
var v = (from u in TblUsers
select u).Count();
the SQL that actually executes when I run this is:
SELECT COUNT(*) AS [value]
FROM [tblUsers] AS [t0]
Amazing, huh? Another poster made the really good suggestion of getting LinqPad - it is a WONDERFUL tool. It will show you the exact SQL that gets executed. You can always confirm with SQL profiler too.

Check the Log of the SQL used in the query.
using (var dbc = new siteDataContext())
{
dbc.Log = Console.Out;
var bd = (from b in dbc.birthdays
select b).Count();
Console.WriteLine("{0}", bd);
}
gives the query:
SELECT COUNT(*) AS [value]
FROM [dbo].[birthdays] AS [t0]
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 3.5.30729.1

You can just use the argument of .Count().
int numEmails = EmailBatchProposal.Count(x => x.EmailBatchId == emailBatchId)
As noted below, this doesn't actually resolve to any different SQL, but I think it's at least a cleaner looking alternative.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.