LINQ Where in MAX - c#

I currently have this linq statement:
from s in SubContentRevisions
where s.SubContentID.Equals("e3f319f1-65cc-4799-b84d-309941dbc1da")
&& s.RevisionNumber == (SubContentRevisions.Max(s1 => s1.RevisionNumber))
select s
which generates this SQL (according to LINQPad):
-- Region Parameters
DECLARE #p0 UniqueIdentifier = 'e3f319f1-65cc-4799-b84d-309941dbc1da'
-- EndRegion
SELECT [t0].[SubContentRevisionID], [t0].[SubContentID], [t0].[RevisionNumber], [t0].[RevisionText], [t0].[CreatedDate], [t0].[ModifiedDate]
FROM [SubContentRevision] AS [t0]
WHERE ([t0].[SubContentID] = #p0) AND ([t0].[RevisionNumber] = ((
SELECT MAX([t1].[RevisionNumber])
FROM [SubContentRevision] AS [t1]
)))
How can I make it generate this SQL statement? I can't seem to find anything related anywhere. (I need it to add the where clause to the subquery)
-- Region Parameters
DECLARE #p0 UniqueIdentifier = 'e3f319f1-65cc-4799-b84d-309941dbc1da'
-- EndRegion
SELECT [t0].[SubContentRevisionID], [t0].[SubContentID], [t0].[RevisionNumber], [t0].[RevisionText], [t0].[CreatedDate], [t0].[ModifiedDate]
FROM [SubContentRevision] AS [t0]
WHERE ([t0].[SubContentID] = #p0) AND ([t0].[RevisionNumber] = ((
SELECT MAX([t1].[RevisionNumber])
FROM [SubContentRevision] AS [t1]
WHERE [SubContentID] = #p0 -- **********Adds the where clause**********
)))

I think you want:
from s in SubContentRevisions
where s.SubContentID.Equals("e3f319f1-65cc-4799-b84d-309941dbc1da")
&& s.RevisionNumber == (SubContentRevisions.Where(s.SubContentID.Equals("..."))
.Max(s1 => s1.RevisionNumber))
select s
Or, more clearly:
var specificSubContents = SubContentRevisions.Where(s =>
s.SubContentID.Equals("e3f319f1-65cc-4799-b84d-309941dbc1da")
var query = from s in specificSubContents
where s.RevisionNumber = s.Max(s1 => s1.RevisionNumber)
select s;
Alternatively, it sounds like you could actually do:
var latest = (from s in SubContentRevisions
where s.SubContentID.Equals("e3f319f1-65cc-4799-b84d-309941dbc1da")
orderby s.RevisionNumber descending
select s).FirstOrDefault();

How about adding the where clause to the subquery (max):
from s in SubContentRevisions
where s.SubContentID.Equals("e3f319f1-65cc-4799-b84d-309941dbc1da")
&& s.RevisionNumber == (SubContentRevisions
.Where(s1 => s1.SubContentID.Equals(s.SubContentID))
.Max(s1 => s1.RevisionNumber))
select s

Related

Why are multiple where in LINQ so slow?

Using C# and Linq to SQL, I found that my query with multiple where is orders of magnitude slower than with a single where / and.
Here is the query
using (TeradiodeDataContext dc = new TeradiodeDataContext())
{
var filterPartNumberID = 71;
var diodeIDsInBlades = (from bd in dc.BladeDiodes
select bd.DiodeID.Value).Distinct();
var diodesWithTestData = (from t in dc.Tests
join tt in dc.TestTypes on t.TestTypeID equals tt.ID
where tt.DevicePartNumberID == filterPartNumberID
select t.DeviceID.Value).Distinct();
var result = (from d in dc.Diodes
where d.DevicePartNumberID == filterPartNumberID
where diodesWithTestData.Contains(d.ID)
where !diodeIDsInBlades.Contains(d.ID)
orderby d.Name
select d);
var list = result.ToList();
// ~15 seconds
}
However, when the condition in the final query is this
where d.DevicePartNumberID == filterPartNumberID
& diodesWithTestData.Contains(d.ID)
& !diodeIDsInBlades.Contains(d.ID)
// milliseconds
it is very fast.
Comparing the SQL in result before calling ToList(), here are the queries (value 71 manually added in place of #params)
-- MULTIPLE WHERE
SELECT [t0].[ID], [t0].[Name], [t0].[M2MID], [t0].[DevicePartNumberID], [t0].[Comments], [t0].[Hold]
FROM [dbo].[Diode] AS [t0]
WHERE (NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t2].[value]
FROM (
SELECT [t1].[DiodeID] AS [value]
FROM [dbo].[BladeDiode] AS [t1]
) AS [t2]
) AS [t3]
WHERE [t3].[value] = [t0].[ID]
))) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t6].[value]
FROM (
SELECT [t4].[DeviceID] AS [value], [t5].[DevicePartNumberID]
FROM [dbo].[Test] AS [t4]
INNER JOIN [dbo].[TestType] AS [t5] ON [t4].[TestTypeID] = ([t5].[ID])
) AS [t6]
WHERE [t6].[DevicePartNumberID] = (71)
) AS [t7]
WHERE [t7].[value] = [t0].[ID]
)) AND ([t0].[DevicePartNumberID] = 71)
ORDER BY [t0].[Name]
and
-- SINGLE WHERE
SELECT [t0].[ID], [t0].[Name], [t0].[M2MID], [t0].[DevicePartNumberID], [t0].[Comments], [t0].[Hold]
FROM [dbo].[Diode] AS [t0]
WHERE ([t0].[DevicePartNumberID] = 71) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t3].[value]
FROM (
SELECT [t1].[DeviceID] AS [value], [t2].[DevicePartNumberID]
FROM [dbo].[Test] AS [t1]
INNER JOIN [dbo].[TestType] AS [t2] ON [t1].[TestTypeID] = ([t2].[ID])
) AS [t3]
WHERE [t3].[DevicePartNumberID] = (71)
) AS [t4]
WHERE [t4].[value] = [t0].[ID]
)) AND (NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t6].[value]
FROM (
SELECT [t5].[DiodeID] AS [value]
FROM [dbo].[BladeDiode] AS [t5]
) AS [t6]
) AS [t7]
WHERE [t7].[value] = [t0].[ID]
)))
ORDER BY [t0].[Name]
The two SQL queries execute in < 1 second in SSMS and produce the same results.
So I'm wondering why the first is slower on the LINQ side. It's worrying to me because I know I've used multiple where elsewhere, without being aware of a such a severe performance impact.
This question even has answered with both multiple & and where. And this answer even suggests using multiple where clauses.
Can anyone explain why this happens in my case?
Because writing like this
if (someParam1 != 0)
{
myQuery = myQuery.Where(q => q.SomeField1 == someParam1)
}
if (someParam2 != 0)
{
myQuery = myQuery.Where(q => q.SomeField2 == someParam2)
}
is NOT(upd) the same as (in case when someParam1 and someParam2 != 0)
myQuery = from t in Table
where t.SomeField1 == someParam1
&& t.SomeField2 == someParam2
select t;
is (NOT deleted) the same as
myQuery = from t in Table
where t.SomeField1 == someParam1
where t.SomeField2 == someParam2
select t;
UPD
Yes, I do mistake. Second query is same, first is not same.
First and Second queries not EXACTLY the same. Let me show you what I mean.
1st query with lamda-expression writen as
t.Where(r => t.SomeField1 == someParam1 && t.SomeField2 == someParam2)
2nd query as
t.Where(r => r.SomeField1 == someParam1).Where(r => r.SomeField2 == someParam2)
In this case in generated SQL Predicate with SomeField2 goes first (it is important, see below)
In 1st case we getting this SQL:
SELECT <all field from Table>
FROM table t
WHERE t.SomeField1 = :someParam1
AND t.SomeField2 = :someParam2
In 2 case the SQL is:
SELECT <all field from Table>
FROM table t
WHERE t.SomeField2 = :someParam2
AND t.SomeField1 = :someParam1
As we see there are 2 'same' SQLs. As we see, the OP's SQLs are also 'same', they are different in order of predicates in WHERE clause (as in my example). And I guess that SQL optimizer generate 2 different execution plans and may be(!!!) doing NOT EXISTS, then EXISTS and then filtering take more time than do first filtering and after that do EXISTS and NOT EXISTS
UPD2
It is a 'problem' of Linq Provider (ORM). I'm using another ORM (linq2db), and it generates for me EXACTLY the same SQLs in both cases.

Linq to SQL - Joins and Skip+Take

I have Linq-to-SQL code that works with a many-to-many relationship, but note that the relationship itself has its own set of attributes (in this case, Products are in Many Categories, and each product-in-category relation has its own SortOrder attribute).
I have a Linq-to-SQL block that returns matching Products with Category membership information. When I execute the code it generates optimised T-SQL code like so:
exec sp_executesql N'SELECT [t0].[ProductId], [t0].[Name], [t1].[ProductId] AS [ProductId2], [t1].[CategoryId], [t1].[SortOrder] AS [SortOrder2], [t2].[CategoryId] AS [CategoryId2], [t2].[Name] AS [Name2] (
SELECT COUNT(*)
FROM [dbo].[ProductsInCategories] AS [t3]
INNER JOIN [dbo].[Categories] AS [t4] ON [t4].[CategoryId] = [t3].[CategoryId]
WHERE [t3].[ProductId] = [t0].[ProductId]
) AS [value]
FROM [dbo].[Products] AS [t0]
LEFT OUTER JOIN ([dbo].[ProductsInCategories] AS [t1]
INNER JOIN [dbo].[Categories] AS [t2] ON [t2].[CategoryId] = [t1].[CategoryId]) ON [t1].[ProductId] = [t0].[ProductId]
WHERE (([t0].[OwnerId]) = #p0) AND ([t0].[Visible] = 1)
ORDER BY [t0].[SortOrder], [t0].[Name], [t0].[ProductId], [t1].[CategoryId]',N'#p0 bigint',#p0=3
However, when I add paging instructions (i.e.".Skip(0).Take(50)") to the Linq expression the generated SQL becomes this:
exec sp_executesql N'SELECT TOP (50) [t0].[ProductId], [t0].[Name]
FROM [dbo].[Products] AS [t0]
WHERE (([t0].[OwnerId]) = #p0) AND ([t0].[Visible] = 1)
ORDER BY [t0].[SortOrder], [t0].[Name]',N'#p0 bigint',#p0=3
Which means the Category membership information isn't loaded anymore, so Linq-to-SQL then executes the manual loading code 50 times over (one for each member in the returned set):
exec sp_executesql N'SELECT [t0].[ProductId], [t0].[CategoryId], [t0].[SortOrder], [t1].[CategoryId] AS [CategoryId2], [t1].[Name]
FROM [dbo].[ProductsInCategories] AS [t0]
INNER JOIN [dbo].[Categories] AS [t1] ON [t1].[CategoryId] = [t0].[CategoryId]
WHERE [t0].[ProductId] = #x1',N'#x1 bigint',#x1=1141
(obviously the "#x1" ID parameter varies for each result from the original query).
So clearly Linq paging breaks the query and causes it to load data separately. Is there a way around this or should I do paging in my own software?
...fortunately the number of products in the database is small enough (<500) to do this, but it just feels dirty because there could be tens of thousands of products, and this just wouldn't be a good query.
EDIT:
Here is my Linq:
DataLoadOptions dlo = new DataLoadOptions();
dlo.LoadWith<Product>( p => p.ProductsInCategories );
dlo.LoadWith<ProductsInCategory>( pic => pic.Category );
this.LoadOptions = dlo;
query = from p in this.Products
select p;
// The lines below are added conditionally:
query = query.OrderBy( p => p.SortOrder ).ThenBy( p => p.Name );
query = query.Where( p => p.Visible );
query = query.Where( p => p.Name.Contains( filter ) || p.Description.Contains( filter ) );
query = query.Where( p => p.OwnerId == siteId );
The skip/take lines are added optionally, and are the only differences that cause the different T-SQL generation (as far as I know):
IQueryable<Product> query = GetProducts( siteId, category, filter, showHidden, sortBySortOrder );
///////////////////////////////////
total = query.Count();
var pagedProducts = query.Skip( pageIndex * pageSize ).Take( pageSize );
return pagedProducts;
An alternative answer which first pages the products and then selects products and categories in a parent-child structure would be like this:
var filter = "a";
var pageSize = 2;
var pageIndex = 1;
// get the correct products
var query = Products.AsQueryable();
query = query.Where (q => q.Name.Contains(filter));
query = query.OrderBy (q => q.SortOrder).ThenBy(q => q.Name);
// do paging
query = query.Skip(pageSize*pageIndex).Take(pageSize);
// now get products + categories as tree structure
var query2 = query.Select(
q=>new
{
q.Name,
Categories=q.ProductsInCategories.Select (pic => pic.Category)
});
Which produces a single SQL statement
-- Region Parameters
DECLARE #p0 NVarChar(1000) = '%a%'
DECLARE #p1 Int = 2
DECLARE #p2 Int = 2
-- EndRegion
SELECT [t2].[Name], [t4].[CategoryId], [t4].[Name] AS [Name2], [t4].[Visible], (
SELECT COUNT(*)
FROM (
SELECT [t5].[CategoryId]
FROM [ProductsInCategories] AS [t5]
WHERE [t5].[ProductId] = [t2].[ProductId]
) AS [t6]
INNER JOIN [Categories] AS [t7] ON [t7].[CategoryId] = [t6].[CategoryId]
) AS [value]
FROM (
SELECT [t1].[ProductId], [t1].[Name], [t1].[ROW_NUMBER]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t0].[SortOrder], [t0].[Name], [t0].[ProductId]) AS [ROW_NUMBER], [t0].[ProductId], [t0].[Name]
FROM [Products] AS [t0]
WHERE [t0].[Name] LIKE #p0
) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN #p1 + 1 AND #p1 + #p2
) AS [t2]
LEFT OUTER JOIN ([ProductsInCategories] AS [t3]
INNER JOIN [Categories] AS [t4] ON [t4].[CategoryId] = [t3].[CategoryId]) ON [t3].[ProductId] = [t2].[ProductId]
ORDER BY [t2].[ROW_NUMBER], [t3].[CategoryId], [t3].[ProductId]
Here is a workaround: you should construct your query based on all your conditions, perform ordering there but select only the primary key on your Product table (let's assume this is ProductId column).
The next step is to take the total count (to calculate rows should be skipped and taken),
and the last step is to select all the records from your Product table whose ProductIds are in the query (note: Skip and Take extension methods should be applied to query, not to the new select itself).
This will get you a SELECT statement similar to yours (from the first example) with related entities.
EDIT:
Just created a similar DB structure (according to the original SQL from the question):
Then used:
using (var db = new TestDataContext())
{
DataLoadOptions options = new DataLoadOptions();
options.LoadWith<Product>(p => p.ProductsInCategories);
options.LoadWith<ProductsInCategory>(pic => pic.Category);
db.LoadOptions = options;
var filter = "product";
var pageIndex = 1;
var pageSize = 10;
var query = db.Products
.OrderBy(p => p.SortOrder)
.ThenBy(p => p.Name)
.Where(p => p.Name.Contains(filter) || p.Description.Contains(filter))
.Select(p => p.ProductId);
var total = query.Count();
var products = db.Products
.Where(p => query.Skip(pageIndex * pageSize).Take(pageSize).Contains(p.ProductId))
.ToList();
}
After the .ToList() call, products variable hold products with product categories with categories. This also produced 2 SQL statement, one - for .Count() statement:
exec sp_executesql N'SELECT COUNT(*) AS [value]
FROM [dbo].[Products] AS [t0]
WHERE ([t0].[Name] LIKE #p0) OR ([t0].[Description] LIKE #p1)',N'#p0 nvarchar(4000),#p1 nvarchar(4000)',#p0=N'%product%',#p1=N'%product%'
and another one for .ToList():
exec sp_executesql N'SELECT [t0].[ProductId], [t0].[Name], [t0].[Description], [t0].[SortOrder], [t1].[ProductId] AS [ProductId2], [t1].[CategoryId], [t1].[SortOrder] AS [SortOrder2], [t2].[CategoryId] AS [CategoryId2], [t2].[Name] AS [Name2], (
SELECT COUNT(*)
FROM (
SELECT NULL AS [EMPTY]
FROM [dbo].[ProductsInCategories] AS [t6]
INNER JOIN [dbo].[Category] AS [t7] ON [t7].[CategoryId] = [t6].[CategoryId]
WHERE [t6].[ProductId] = [t0].[ProductId]
) AS [t8]
) AS [value]
FROM [dbo].[Products] AS [t0]
LEFT OUTER JOIN ([dbo].[ProductsInCategories] AS [t1]
INNER JOIN [dbo].[Category] AS [t2] ON [t2].[CategoryId] = [t1].[CategoryId]) ON [t1].[ProductId] = [t0].[ProductId]
WHERE EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT [t4].[ProductId]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t3].[SortOrder], [t3].[Name], [t3].[ProductId]) AS [ROW_NUMBER], [t3].[ProductId]
FROM [dbo].[Products] AS [t3]
WHERE ([t3].[Name] LIKE #p0) OR ([t3].[Description] LIKE #p1)
) AS [t4]
WHERE [t4].[ROW_NUMBER] BETWEEN #p2 + 1 AND #p2 + #p3
) AS [t5]
WHERE [t5].[ProductId] = [t0].[ProductId]
)
ORDER BY [t0].[ProductId], [t1].[CategoryId]',N'#p0 nvarchar(4000),#p1 nvarchar(4000),#p2 int,#p3 int',#p0=N'%product%',#p1=N'%product%',#p2=10,#p3=10
No more extra queries (as SQL Server Profiler said).

linq2sql renders unnecessary ORDER BY

I have a bit complex linq2sql query, it doesn't contain any 'order by' statements, but somehow linq2sql thinks it is necessery and inserts it. Unfortunately this 'order by' statement hurts performance and I don't know how to remove it...
Here's the linq2sql expressions (I don't think that they might help, but anyway...)
var lockedBy = Guid.NewGuid();
var locks = LinqDataContext.Instance.LockBranches(3, lockedBy, DateTime.Now + TimeSpan.FromMinutes(3)); // Stored procedure...
var x = LinqDataContext.Instance.Branches
.Where(branch => branch.LockedBy == lockedBy)
.Select
(
branch => new
{
Branch = branch,
Leaves = branch.Leaves
.Select
(
leaf => new
{
Leaf = leaf,
Estimate = leaf.Representation.Estimates
.GroupBy(estimate=>estimate.SegmentID)
.Sum
(
estimatesBySegment => estimatesBySegment.Average
(
estimate => estimate.EstimateRequests.Average
(
estimateRequest => estimateRequest.EstimateSubmit.Value
)
)
)
}
)
}
);
It renders to the following sql
SELECT [t0].[ID], [t0].[IndexID], [t0].[IndexNo], [t0].[Sealed], [t0].[LockedBy], [t0].[UnlockOn], [t1].[ID] AS [ID2], [t1].[BranchID], [t1].[RepresentationID], [t1].[Xml], (
SELECT SUM([t7].[value])
FROM (
SELECT AVG([t6].[value]) AS [value]
FROM (
SELECT (
SELECT AVG([t5].[Value])
FROM [dbo].[EstimateRequests] AS [t4]
LEFT OUTER JOIN [dbo].[EstimateSubmits] AS [t5] ON [t5].[EstimateRequestID] = [t4].[ID]
WHERE [t4].[EstimateID] = [t3].[ID]
) AS [value], [t2].[ID], [t3].[RepresentationID], [t3].[SegmentID]
FROM [dbo].[Representations] AS [t2], [dbo].[Estimates] AS [t3]
) AS [t6]
WHERE ([t6].[ID] = [t1].[RepresentationID]) AND ([t6].[RepresentationID] = [t6].[ID])
GROUP BY [t6].[SegmentID]
) AS [t7]
) AS [Estimate], (
SELECT COUNT(*)
FROM [dbo].[Leaves] AS [t8]
WHERE [t8].[BranchID] = [t0].[ID]
) AS [value]
FROM [dbo].[Branches] AS [t0]
LEFT OUTER JOIN [dbo].[Leaves] AS [t1] ON [t1].[BranchID] = [t0].[ID]
WHERE [t0].[LockedBy] = #p0
ORDER BY [t0].[ID], [t1].[ID] <-- Here's the unnecessary ORDER BY
The order by is needed to be able to distinguish the child collection Leaves. (Every time you have a child collection there is an order by)

LINQ SQL Statement is returning wrong results

I have a SQL statement which afaik is correct but the response from the SQL server is incorrect. I've debugged this issue and found that if I execute the SQL statement without the wrapping store procedure I get different results. All I have done is replaced the variable with the actual values
Linq generated code:
exec sp_executesql N'SELECT [t0].[RoomId], [t0].[Title], [t0].[Detail], [t0].[ThumbnailPath], [t0].[PageId], [t0].[TypeId], [t0].[LocationId], [t0].[TimeStamp], [t0].[DeleteStamp]
FROM [dbo].[Room] AS [t0]
INNER JOIN [dbo].[RoomType] AS [t1] ON [t1].[RoomTypeId] = [t0].[TypeId]
WHERE ([t1].[Sleeps] >= #p0) AND ([t0].[DeleteStamp] IS NULL) AND (((
SELECT COUNT(*)
FROM [dbo].[Booking] AS [t2]
INNER JOIN [dbo].[Order] AS [t3] ON [t3].[OrderId] = [t2].[OrderId]
WHERE ([t2].[StartStamp] <= #p1)
AND ([t2].[EndStamp] >= #p2)
AND (([t3].[Status] = #p3)
OR ([t3].[Status] = #p4)
OR (([t3].[Status] = #p5) AND ([t3].[CreatedStamp] > #p6)))
AND ([t2].[RoomId] = [t0].[RoomId])
)) = #p7)
',N'#p0 int,#p1 datetime,#p2 datetime,#p3 int,#p4 int,#p5 int,#p6 datetime,#p7 int',
#p0=1,#p1='2011-04-05 00:00:00',#p2='2011-04-04 00:00:00',#p3=3,#p4=5,#p5=0,#p6='2011-04-04 12:36:09.490',#p7=0
Without the SP
SELECT [t0].[RoomId], [t0].[Title], [t0].[Detail], [t0].[ThumbnailPath], [t0].[PageId], [t0].[TypeId], [t0].[LocationId], [t0].[TimeStamp], [t0].[DeleteStamp]
FROM [dbo].[Room] AS [t0]
INNER JOIN [dbo].[RoomType] AS [t1] ON [t1].[RoomTypeId] = [t0].[TypeId]
WHERE ([t1].[Sleeps] >= 1) AND ([t0].[DeleteStamp] IS NULL) AND (((
SELECT COUNT(*)
FROM [dbo].[Booking] AS [t2]
INNER JOIN [dbo].[Order] AS [t3] ON [t3].[OrderId] = [t2].[OrderId]
WHERE ([t2].[StartStamp] <= '2011-04-05 00:00:00')
AND ([t2].[EndStamp] >= '2011-04-04 00:00:00')
AND (([t3].[Status] = 3)
OR ([t3].[Status] = 4)
OR (([t3].[Status] = 5) AND ([t3].[CreatedStamp] > '2011-04-04 12:36:09.490')))
AND ([t2].[RoomId] = [t0].[RoomId])
)) = 0)
The first result set returns 1 row where as the 2nd returns me 21!!
Can anybody spot the difference as its driving me crazy.
You made an error replacing the variables!
You replaced p4 with 4 when you should have replaced it with 5 and p5 with 5 instead of 0.
Well, one difference is #p5=0 while you have [t3].[Status] = 5 in the other.

LINQ to SQL using GROUP BY and COUNT(DISTINCT)

I have to perform the following SQL query:
select answer_nbr, count(distinct user_nbr)
from tpoll_answer
where poll_nbr = 16
group by answer_nbr
The LINQ to SQL query
from a in tpoll_answer
where a.poll_nbr = 16 select a.answer_nbr, a.user_nbr distinct
maps to the following SQL query:
select distinct answer_nbr, distinct user_nbr
from tpoll_answer
where poll_nbr = 16
So far, so good. However the problem raises when trying to GROUP the results, as I'm not being able to find a LINQ to SQL query that maps to the first query I wrote here (thank you LINQPad for making this process a lot easier). The following is the only one that I've found that gives me the desired result:
from answer in tpoll_answer where answer.poll_nbr = 16 _
group by a_id = answer.answer_nbr into votes = count(answer.user_nbr)
Which in turns produces the follwing ugly and non-optimized at all SQL query:
SELECT [t1].[answer_nbr] AS [a_id], (
SELECT COUNT(*)
FROM (
SELECT CONVERT(Bit,[t2].[user_nbr]) AS [value], [t2].[answer_nbr], [t2].[poll_nbr]
FROM [TPOLL_ANSWER] AS [t2]
) AS [t3]
WHERE ([t3].[value] = 1) AND ([t1].[answer_nbr] = [t3].[answer_nbr]) AND ([t3].[poll_nbr] = #p0)
) AS [votes]
FROM (
SELECT [t0].[answer_nbr]
FROM [TPOLL_ANSWER] AS [t0]
WHERE [t0].[poll_nbr] = #p0
GROUP BY [t0].[answer_nbr]
) AS [t1]
-- #p0: Input Int (Size = 0; Prec = 0; Scale = 0) [16]
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 3.5.30729.1
Any help will be more than appreciated.
There isn't direct support for COUNT(DISTINCT {x})), but you can simulate it from an IGrouping<,> (i.e. what group by returns); I'm afraid I only "do" C#, so you'll have to translate to VB...
select new
{
Foo= grp.Key,
Bar= grp.Select(x => x.SomeField).Distinct().Count()
};
Here's a Northwind example:
using(var ctx = new DataClasses1DataContext())
{
ctx.Log = Console.Out; // log TSQL to console
var qry = from cust in ctx.Customers
where cust.CustomerID != ""
group cust by cust.Country
into grp
select new
{
Country = grp.Key,
Count = grp.Select(x => x.City).Distinct().Count()
};
foreach(var row in qry.OrderBy(x=>x.Country))
{
Console.WriteLine("{0}: {1}", row.Country, row.Count);
}
}
The TSQL isn't quite what we'd like, but it does the job:
SELECT [t1].[Country], (
SELECT COUNT(*)
FROM (
SELECT DISTINCT [t2].[City]
FROM [dbo].[Customers] AS [t2]
WHERE ((([t1].[Country] IS NULL) AND ([t2].[Country] IS NULL)) OR (([t1]
.[Country] IS NOT NULL) AND ([t2].[Country] IS NOT NULL) AND ([t1].[Country] = [
t2].[Country]))) AND ([t2].[CustomerID] <> #p0)
) AS [t3]
) AS [Count]
FROM (
SELECT [t0].[Country]
FROM [dbo].[Customers] AS [t0]
WHERE [t0].[CustomerID] <> #p0
GROUP BY [t0].[Country]
) AS [t1]
-- #p0: Input NVarChar (Size = 0; Prec = 0; Scale = 0) []
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 3.5.30729.1
The results, however, are correct- verifyable by running it manually:
const string sql = #"
SELECT c.Country, COUNT(DISTINCT c.City) AS [Count]
FROM Customers c
WHERE c.CustomerID != ''
GROUP BY c.Country
ORDER BY c.Country";
var qry2 = ctx.ExecuteQuery<QueryResult>(sql);
foreach(var row in qry2)
{
Console.WriteLine("{0}: {1}", row.Country, row.Count);
}
With definition:
class QueryResult
{
public string Country { get; set; }
public int Count { get; set; }
}
The Northwind example cited by Marc Gravell can be rewritten with the City column selected directly by the group statement:
from cust in ctx.Customers
where cust.CustomerID != ""
group cust.City /*here*/ by cust.Country
into grp
select new
{
Country = grp.Key,
Count = grp.Distinct().Count()
};
Linq to sql has no support for Count(Distinct ...). You therefore have to map a .NET method in code onto a Sql server function (thus Count(distinct.. )) and use that.
btw, it doesn't help if you post pseudo code copied from a toolkit in a format that's neither VB.NET nor C#.
This is how you do a distinct count query. Note that you have to filter out the nulls.
var useranswercount = (from a in tpoll_answer
where user_nbr != null && answer_nbr != null
select user_nbr).Distinct().Count();
If you combine this with into your current grouping code, I think you'll have your solution.
simple and clean example of how group by works in LINQ
http://www.a2zmenu.com/LINQ/LINQ-to-SQL-Group-By-Operator.aspx
I wouldn't bother doing it in Linq2SQL. Create a stored Procedure for the query you want and understand and then create the object to the stored procedure in the framework or just connect direct to it.

Categories