This rather simple SQL query is proving to be quite perplexing when attempting it from LINQ.
I have a SQL table Plant with column ZoneMin.
I want to find the minimum and maximum of the values in the column.
The answer in T-SQL is quite simple:
SELECT MIN(ZoneMin), MAX(ZoneMin) FROM Plant
What's a LINQ query that could get me to this (or some similar) SQL?
I've made various attempts at .Aggregate() and .GroupBy() with no luck. I've also looked at several SO questions that seem similar.
This could be simply achieved with methods applied to a resulting array, but I shouldn't need to transport a value from every SQL row when it's so simple in T-SQL.
To achieve the same performance as your original query, you'll need to use grouping (by a constant to minimize impact, e.g. 0), so that you can refer to the same set of records twice in the same query. Using the table name causes a new query to be produced on each reference. Try the following:
(from plant in db.Plants
group plant by 0 into plants
select new { Min = plants.Min(p => p.ZoneMin), Max = plants.Max(p => p.ZoneMin) }
).Single()
This produces the following query:
SELECT MIN(plants.ZoneMin), MAX(plants.ZoneMin)
FROM (SELECT 0 AS Grp, ZoneMin FROM Plants) AS plants
GROUP BY plants.Grp
And after the optimizer is done with it, it spits out something equivalent to your query, at least according to SQL Server Management Studio.
Related
OrderBy is stable for LINQ to Objects, but MSDN on Queryable.OrderBy doesn't mention if it is stable or not.
I guess it depends on the provider implementation. Is it unstable for SQL Server? Because it looks so. I did a quick look at Queryable source code, but it is not obvious from there.
I need to order a collection before other operations and I want to use IQueryable, rather than IEnumerable for the sake of performance.
// All the timestamps are the same and I am getting inconsistent
// results by running it multiple times, first few pages return the same results
var result = data.OrderBy(i => i.TimeStamp).Skip(start).Take(length);
but if I use
var result = data.ToList().OrderBy(i => i.TimeStamp).Skip(start).Take(length);
It works just fine, but I lose performance boost from LINQ to SQL. It seems combination of Queryable OrderBy/Skip/Take produce inconsistent results.
SQL Code generated seems fine to me:
SELECT
...
FROM [dbo].[Table] AS [Extent1]
ORDER BY [Extent1].[TimeStamp] ASC
OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY
In Linq-to-Entities LINQ queries are translated into SQL queries so Linq-to-Objects implementation of OrderBy doesn't matter. You should look at your database implementation of ORDER BY. If you are using MS SQL you can find in docs that:
To achieve stable results between query requests using OFFSET and FETCH, the following conditions must be met:
(...)
The ORDER BY clause contains a column or combination of columns that are guaranteed to be unique.
So ORDER BY for the same values does not guarantee the same order so limiting it could provide different results set. To solve this you can simply sort by some additional column that has unique values e.g. id. So basically you will have:
var result = data
.OrderBy(i => i.TimeStamp)
.ThenBy(i => i.Id)
.Skip(start)
.Take(length);
I take it that by "stable", you mean consistent. If you didn't have the ORDER BY in a SQL query, the order of the data is not guaranteed for each time you run the query. It will simply return all of the data in whatever order is most efficient for the server. When you add the ORDER BY, it will sort that data. Since you are sorting data where all of the sort values are the same, no rows are being reordered, so the ordered data is in an order you don't expect. If you need a specific order, you will need to add a secondary sort column such as an ID.
It is a best to never assume the order of data coming back from the server unless you explicitly define what that order is.
I have a little program that needs to do some calculation on a data range. The range maybe contain about half a millon of records. I just looked to my db and saw that a group by was executed.
I thought that the result was executed on the first line, and later I just worked with data in RAM. But now I think that the query builder combine the expression.
var Test = db.Test.Where(x => x > Date.Now.AddDays(-7));
var Test2 = (from p in Test
group p by p.CustomerId into g
select new { UniqueCount = g.Count() } );
In my real world app I got more subqueries that is based on the range selected by the first query. I think I just added a big overhead to let the DB make different selects.
Now I bascilly just call .ToList() after the first expression.
So my question is am I right about that the query builder combine different IQueryable when it builds the expression tree?
Yes, you are correct. LINQ expressions are lazily evaluated at the moment you evaluate them (via .ToList(), for example). At that point in time, Entity Framework will look at the total query and build an SQL statement to represent it.
In this particular case, it's probably wiser to not evaluate the first query, because the SQL database is optimized for performing set-based operations like grouping and counting. Rather than forcing the database to send all the Test objects across the wire, deserializing the results into in-memory objects, and then performing the grouping and counting locally, you will likely see better performance by having the SQL database just return the resulting Counts.
Background:
Entity Framework 4, with SQL Server 2008
Problem:
I have a table Order. Each row has a column Timestamp.
The user can choose some time in past and I need to get the Order closest to the specified time, but that had occurred before the specified time. In other words, the last order before the specified time.
For example, if I have orders
2008-01-12
2009-04-17
2009-09-24
2010-11-02
2010-12-01
2011-05-16
and choose a date 2010-07-22, I should get the 2009-09-24 order, because that's the last order before the specified date.
var query = (from oData in db.OrderDatas
where oData.Timestamp <= userTime
orderby oData.Timestamp ascending
select oData).Last();
This is closest to what I am trying. However, I am not sure how exactly does the Last operator work when translated to SQL, if it's translated at all.
Question:
Will this query fetch all data (earlier than userTime) and then take the last element, or will it be translated so that only one element will be returned from the database? My table can hold very large number of rows (100000+) so performance is an issue here.
Also, how would one retrieve the closest time in the database (not necessarily the earlier time)? In the example of 2010-07-22, one would get 2010-11-02, because it is closer to the date specified than 2009-09-24.
In general, if you're concerned about how LINQ behaves, you should check what does happen with the SQL. If you haven't worked out how to see how your LINQ queries are turned into SQL, that should be the very next thing you do.
As you noted in your comment, Last() isn't supported by LINQ to SQL so the same may be true for EF. Fortunately, it's easy to use First() instead:
var query = (from oData in db.OrderDatas
where oData.Timestamp <= userTime
orderby oData.Timestamp descending
select oData).First();
Try using:
var query = (from oData in db.OrderDatas
where oData.Timestamp <= userTime
orderby oData.Timestamp descending
select oData).Take(1);
It's the equivalent of TOP 1
Question:
Will this query fetch all data (earlier than userTime) and then take
the last element, or will it be translated so that only one element
will be returned from the database? My table can hold very large
number of rows (100000+) so performance is an issue here.
In this case, using the first() approach, the query will be executed immediately and it will optimized in such a way that it will ony retrieve 1 record. Most probably a top(1) select. You really need to check the genereated sql with a sql profilihg tool or by using the log of the datacontext. Or you can use linqpad. linq-2-sql can lead to N+1 queries if not used the proper way. This behaviour is quite predictable but in the beginning you really have to be aware.
I have a setup on SQL Server 2008. I've got three tables. One has a string identifier as a primary key. The second table holds indices into an attribute table. The third simply holds foreign keys into both tables- so that the attributes themselves aren't held in the first table but are instead referred to. Apparently this is common in database normalization, although it is still insane because I know that, since the key is a string, it would take a maximum of 1 attribute per 30 first table room entries to yield a space benefit, let alone the time and complexity problems.
How can I write a LINQ to SQL query to only return values from the first table, such that they hold only specific attributes, as defined in the list in the second table? I attempted to use a Join or GroupJoin, but apparently SQL Server 2008 cannot use a Tuple as the return value.
"I attempted to use a Join or
GroupJoin, but apparently SQL Server
2008 cannot use a Tuple as the return
value".
You can use anonymous types instead of Tuples which are supported by Linq2SQL.
IE:
from x in source group x by new {x.Field1, x.Field2}
I'm not quite clear what you're asking for. Some code might help. Are you looking for something like this?
var q = from i in ctx.Items
select new
{
i.ItemId,
i.ItemTitle,
Attributes = from map in i.AttributeMaps
select map.Attribute
};
I use this page all the time for figuring out complex linq queries when I know the sql approach I want to use.
VB http://msdn.microsoft.com/en-us/vbasic/bb688085
C# http://msdn.microsoft.com/en-us/vcsharp/aa336746.aspx
If you know how to write the sql query to get the data you want then this will show you how to get the same result translating it into linq syntax.
I am trying to do a join with a sub query and can't seem to get it. Here is what is looks like working in sql. How do I get to to work in linq?
SELECT po.*, p.PermissionID
FROM PermissibleObjects po
INNER JOIN PermissibleObjects_Permissions po_p ON (po.PermissibleObjectID = po_p.PermissibleObjectID)
INNER JOIN Permissions p ON (po_p.PermissionID = p.PermissionID)
LEFT OUTER JOIN
(
SELECT u_po.PermissionID, u_po.PermissibleObjectID
FROM Users_PermissibleObjects u_po
WHERE u_po.UserID = '2F160457-7355-4B59-861F-9871A45FD166'
) used ON (p.PermissionID = used.PermissionID AND po.PermissibleObjectID = used.PermissibleObjectID)
WHERE used.PermissionID is null
Without seeing your database and data model, it's pretty impossible to offer any real help. But, probably the best way to go is:
download linqpad - http://www.linqpad.net/
create a connection to your database
start with the innermost piece - the subquery with the "where" clause
get each small query working, then join them up. Linqpad will show you the generated SQL, as well as the results, so build your small queries up until they are right
So, basically, split your problem up into smaller pieces. Linqpad is fantastic as it lets you test these things out, and check your results as you go
hope this helps, good luck
Toby
The LINQ translation for your query is suprisingly simple:
from pop in PermissibleObjectPermissions
where !pop.UserPermissibleObjects.Any (
upo => upo.UserID == new Guid ("2F160457-7355-4B59-861F-9871A45FD166"))
select new { pop.PermissibleObject, pop.PermissionID }
In words: "From all object permissions, retrieve those with at least one user-permission whose UserID is 2F160457-7355-4B59-861F-9871A45FD16".
You'll notice that this query uses association properties for navigating relationships - this avoids the need for "joining" and simplfies the query. As a result, the LINQ query is much closer to its description in English than the original SQL query.
The trick, when writing LINQ queries, is to get out of the habit of "transliterating" SQL into LINQ.