Efficient way to query each element of a list - c#

I have to iterate through a collection of objects (let's say ID's), and execute a specific query for each of these objects. For example:
IEnumerable<int> ids = getIDs(); //[1,2,3,4...]
Right now I have this solution:
DBEntities db = new DBEntities();
var results =
from a in db.TABLEA
join b in db.TABLEB on a.id equals b.id
join c in db.TABLEC on b.oid equals c.oid
where ids.Contains(c.id)
select a;
but keep in mind that the list of IDs is smaller than the table where I am searching. That being said, the solution above seems inefficient, since I am looking for each record of my table against a smaller list, when I wanted the opposite. I also do not want to iterate through the list, and execute the query for one element at a time.
Ideally, I would want something like this:
DBEntities db = new DBEntities();
(some data structure) ids = getIDs();
var results =
from a in db.TABLEA
join b in db.TABLEB on a.id equals b.id
join c in db.TABLEC on b.oid equals c.oid
join i in ids on c.id equals i.id;
The (pseudo-)code above would iterate my elements of the list, in a single query, doing so in a single query and performing my filter by each element of the list.
Is this the way to do it? If so, what is the appropriate data structure to implement this solution? If not, which alternatives do I have?

Magnus's answer is true but not right :)
Technically you do have two options in newer versions of Entity Framework (and I discovered this by chance). Contains of course, but also Join.
Joining with a local sequence of primitive types has always been possible, but very quickly (after some tens of elements) raised a SqlException:
Some part of your SQL statement is nested too deeply. Rewrite the query or break it up into smaller queries.
EF tries to translate the local list to a temporary table in SQL. This is surprisingly non-trivial. It has to build the table by UNION-ing select statements that return 1 element each. This is what it used to look like with only 5 elements!
....
INNER JOIN (SELECT
[UnionAll3].[C1] AS [C1]
FROM (SELECT
[UnionAll2].[C1] AS [C1]
FROM (SELECT
[UnionAll1].[C1] AS [C1]
FROM (SELECT
1 AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
UNION ALL
SELECT
2 AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable2]) AS [UnionAll1]
UNION ALL
SELECT
3 AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable3]) AS [UnionAll2]
UNION ALL
SELECT
4 AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable4]) AS [UnionAll3]
UNION ALL
SELECT
5 AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable5]) AS [UnionAll4] ON ....
As you see, if you look closely, the UNION statements are nested. The nesting level soon becomes too deep, which made this approach practically useless.
However, currently the SQL looks like this:
....
INNER JOIN (SELECT
1 AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
UNION ALL
SELECT
2 AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable2]
UNION ALL
SELECT
3 AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable3]
UNION ALL
SELECT
4 AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable4]
UNION ALL
SELECT
5 AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable5]) AS [UnionAll4] ON ....
Still not a beauty, but the nesting is replaced by chaining and the list can contain hundreds of elements.
But... (and this is why Magnus's answer is true), it doesn't perform well. A simple test with 2000 elements in the list took 2.5s with join and .25s with Contains. So there's still no practical case for joining with a local sequence.

If this is linq2Sql (or Linq2Entites) your only option is as in your example 1. You can not "join" a table with an in memory list. You have to use Contains. Which will be translated to an Where c.id IN(2,3,4,5,...) SQL query

Related

Linq generate select with no from

In sql I can do this
select 'a' as MyColumn
so, i have a query in linq with entity framework that get some data from the database, but, i need union that query with one row, and in sql I can do this:
select ... from ...
union
select 'a' as MyColumn
How can i generate this query with linq?
I tried to do this:
var query = (from ... select new {..}).Union(new List<...> { new ...() { MyColumn = 'a' } })
But i gess that Entity Framework DON'T know how to translate that in memory list to sql
I need to get an IQueryable result, not a List or other in memory Collection, because i need to join that result to other sql linq querys in the future.
This isn't possible and you shouldn't do it. Both for the same reason: Entity Framework will try to translate the whole LINQ statement into SQL, including the local list (new List<...>).
The reason why it's not possible is that EF has no way to translate C# objects into SQL constructs.
The reason why you shouldn't do it is that it's incredibly wasteful: you build the list in C# code, EF (if it could) translates it into a SQL statement, the database runs the SQL statement and converts it to a result set, EF receives the result set and converts it into the list you originally offered it.
Just to demonstrate it, I'll show what happens if you do this with a list of primitive values which EF does know how to translate into SQL:
var ints = Enumerable.Range(1,5);
var res = Products.Select(c => c.Id).Union(ints).ToList();
This produces the following SQL statement:
SELECT
[Distinct1].[C1] AS [C1]
FROM ( SELECT DISTINCT
[UnionAll5].[ProductId] AS [C1]
FROM (SELECT
[Extent1].[ProductId] AS [ProductId]
FROM [dbo].[Product] AS [Extent1]
UNION ALL
SELECT
1 AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
UNION ALL
SELECT
2 AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable2]
UNION ALL
SELECT
3 AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable3]
UNION ALL
SELECT
4 AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable4]
UNION ALL
SELECT
5 AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable5]) AS [UnionAll5]
) AS [Distinct1]
As you see, for each element in the list EF generated a SingleRowTablex entry to build a "temp table" to UNION with the ids from the actual query.
Conclusion: just query what you need from the database and add to the result afterwards. It's easy enough to do that:
(from ... select new {..})
.AsEnumerable() // continue in memory
.Union(...)

EntityFramework Group by not included in SQL statement

I'm trying to create a query similar to this:
select randomId
from myView
where ...
group by randomId
NOTE: EF doesn't support the distinct so I was thinking of going around the lack of it with the group by (or so I think)
randomId is numeric
Entity Framework V.6.0.2
This gives me the expected result in < 1 second query
When trying to do the same with EF I have been having some issues.
If I do the LINQ similar to this:
context.myView
.Where(...)
.GroupBy(mt => mt.randomId)
.Select({ Id = group.Key, Count = group.Count() } )
I will get sort of the same result but forcing a count and making the query > 6 seconds
The SQL EF generates is something like this:
SELECT
1 AS [C1],
[GroupBy1].[K1] AS [randomId],
[GroupBy1].[A1] AS [C2]
FROM (
SELECT
[Extent1].[randomId] AS [K1],
COUNT(1) AS [A1]
FROM [dbo].[myView] AS [Extent1]
WHERE (...)
GROUP BY [Extent1].[randomId]
) AS [GroupBy1]
But, if the query had the count commented out it would be back to < 1 second
If I change the Select to be like:
.Select({ Id = group.Key} )
I will get all of rows without the group by statement in the SQL query and no Distinct whatsoever:
SELECT
[Extent1].[anotherField] AS [anotherField], -- 'this field got included automatically on this query and I dont know why, it doesnt affect outcome when removed in SQL server'
[Extent1].[randomId] AS [randomId]
FROM [dbo].[myView] AS [Extent1]
WHERE (...)
Other failed attempts:
query.GroupBy(x => x.randomId).Select(group => group.FirstOrDefault());
The query that was generated is as follows:
SELECT
[Limit1].ALL FIELDS,...
FROM (SELECT
[Extent1].[randomId] AS [randomId]
FROM [dbo].[myView] AS [Extent1]
WHERE (...) AS [Project1]
OUTER APPLY (SELECT TOP (1)
[Extent2].ALL FIELDS,...
FROM [dbo].[myView] AS [Extent2]
WHERE (...) AS [Limit1] -- same as the where above
This query performed rather poorly and still managed to return all Ids for the where clause.
Does anyone have an idea on how to force the usage of the group by without an aggregating function like a count?
In SQL it works but then again I have the distinct keyword as well...
Cheers,
J
var query = from p in TableName
select new {Id = p.ColumnNameId};
var distinctItems = query.Distinct().ToList();
Here is the linq query however you should be able to write an equivalent from EF dbset too. If you have issues let me know.
Cheers!

EntityFramework generating poor tsql (linqpad generates much better)

I have a database structure as below
Family(1) ----- (*) FamilyPersons -----(1)Person(1)------() Expenses (1) -----(0..1)GroceriesDetails
Let me explain that relation, Family can have one or more than one person , we have a mapping table FamilyPersons between Family and Persons. Now each person can enter his expenses which go into Expenses Table. Expense Table has a column ExpenseType (groceries, entertainemnet etc)
and details of each of these expenses goes into their own Tables, so we have a GroceriesDetails table (similarly we have other tables), so we have 1 to 0..1 relation between Expense and Groceries.
Now I am writing a query to get Complete GroceriesDetails for a family
GroceriesDetails.Where (g => g.Expenses.Person.FamilyPersons.Any(fp =>
fp.FamilyId == 1) && g.Expenses.ExpenseType == "GC" )
For this the sql generated by EF is
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Amount] AS [Amount]
FROM [dbo].[GroceriesDetails] AS [Extent1]
INNER JOIN (SELECT [Extent3].[Id] AS [Id1]
FROM [dbo].[Expenses] AS [Extent2]
INNER JOIN [dbo].[GroceriesDetails] AS [Extent3] ON [Extent2].[Id] = [Extent3].[Id]
WHERE N'GC' = [Extent2].[ExpenseType] ) AS [Filter1] ON [Extent1].[Id] = [Filter1].[Id1]
WHERE EXISTS (SELECT
1 AS [C1]
FROM [dbo].[Expenses] AS [Extent4]
INNER JOIN [dbo].[GroceriesDetails] AS [Extent5] ON [Extent4].[Id] = [Extent5].[Id]
INNER JOIN [dbo].[FamilyPerson] AS [Extent6] ON [Extent4].[PersonId] = [Extent6].[PersonId]
WHERE ([Extent1].[Id] = [Extent5].[Id]) AND (1 = [Extent6].[FamilyId])
)
In this query there is a full table join between Expenses and GroceriesDetails tables which is causing performance issues.
Whereas Linqpad generates a much better SQL
SELECT [t0].[Id], [t0].[Amount]
FROM [GroceriesDetails] AS [t0]
INNER JOIN [Expenses] AS [t1] ON [t1].[Id] = [t0].[Id]
WHERE (EXISTS(
SELECT NULL AS [EMPTY]
FROM [Expenses] AS [t2]
INNER JOIN [Person] AS [t3] ON [t3].[Id] = [t2].[PersonId]
CROSS JOIN [FamilyPerson] AS [t4]
WHERE ([t4].[FamilyId] = #p0) AND ([t2].[Id] = [t0].[Id]) AND ([t4].[PersonId] =
[t3].[Id])
)) AND ([t1].[ExpenseType] = #p1)
Please note that we are using WCF data services so this query is written against a WCF data service reference, so I can't traverse from top (family) to bottom (Groceries) as OData allows only one level of select.
Any help on optimizing this code is appreciated.
From the comments I learned that LinqPad uses Linq2SQL while the app uses EF, and that explains the difference.
The thing is that you have zero control on how EF generates SQL.
The only thing you can do is to rewrite your LINQ query to make it "closer" to desired SQL.
For example, instead of
GroceriesDetails.Where (g => g.Expenses.Person.FamilyPersons.Any(fp => fp.FamilyId == 1)
&& g.Expenses.ExpenseType == "GC" )
you can try to write something like (pseudocode):
from g in GrosseriesDetails
join e in Expenses on g.Id = e.GrosseryId
join p in Persons on p.Id = e.PersonId
join f in FamilyPersons on f.PersonId = p.Id
where f.FamilyId == 1 && e.ExpenseType == "GC"
It almost always helps as it tells an ORM a straightforward way to transform it into SQL. The idea is that the expression tree in the "original" case is more complex compare to the proposed scenario, and by simplifying the expression tree we make translator's job easier and more straightforward.
But besides manipulating the LINQ there is no control over how it generates SQL from the expression tree.

The correct way to specify a sub query of a subquery using linq

results = (from r in results
where r.Buildings.Any(x=>x.StructuralElements.Any(s=>s.VALUE == Model.Bedrooms.ToString() && s.CATEGORY=="RM"))
select r);
I think I'm missing joins here. But maybe they are implied? The execution runs so long I can't do a watch to evaluate the generated query expression
The biggest problem in this query is this:
--#p1 = Models.Bedrooms.ToString()
--#p2 = "RM"
SELECT * FROM Results r WHERE EXISTS
(SELECT x.* FROM Results tr JOIN Buildings x ON tr.SomeID=x.SomeID WHERE tr.ID = r.ID AND EXISTS
(SELECT s.* FROM StructuralElements s JOIN Buildings tx ON tx.OtherID = s.OtherID WHERE tx.ID=x.ID AND s.VALUE = #p1 AND s.Category = #p2))
Do you see why this would be bad? For every Result, you're running a subquery (which in itself is running a subquery). This is going to be an exponential increase in time/processing as you start adding things at the root levels (Results and Buildings) because of these nested subqueries. Your best bet is to use joins and get distinct r values after you're done. The SQL would like like this:
SELECT DISTINCT
r.*
FROM
Results r
INNER JOIN Buildings x ON x.SomeID = r.SomeID
INNER JOIN StructuralElements s ON s.OtherID = r.OtherID
WHERE
s.VALUE = #p1 AND s.CATEGORY = #p2
The reason this will work is that when you join, if there are more than one to join back, it will duplicate the original row. The following illustration shows
IDs
R X S
1 - -
Join X
1 1 -
1 2 -
1 3 -
Join S
1 1 1
1 1 2
1 2 5
1 2 6
Assuming S=2 and S=6 meet your criteria, then it will return (in R,X,S form) rows 1,1,2 and 1,2,6. Getting just the distinct r in this case will only return R=1, which is what you're trying to accomplish. Using EF, the relationships already exist, so you don't need to do anything extra, just reference the columns you're trying to filter by:
results = (from r in results
from x in r.Buildings
from s in x.StructuralElements
where s.VALUE == Model.Bedrooms.ToString() && s.CATEGORY=="RM"
select r).Distinct();
This is the SelectMany operator at play (which takes a collection and flattens out subcollections into a single collection).

LINQ to Entities find top records from ordered groupings

I have a problem that I know how to solve in SQL but not with Linq to Entities.
My data looks like this:
ID GROUP TIMESTAMP
-- ----- ---------
1 A 2011-06-20
2 A 2011-06-21
3 B 2011-06-21
4 B 2011-06-22
5 B 2011-06-23
6 C 2011-06-30
I want to retrieve all the Entity objects (not just the ID) such that I am only getting the most recent record from each group. (ie. the records with ids 2, 5, 6)
In SQL I would do something like this:
SELECT * FROM my_table a
WHERE a.timestamp =
(SELECT MAX(timestamp) FROM my_table b
WHERE a.group = b.group)
(For the sake of this question you can assume that timestamp is unique within each group).
I'd like to do this query against a WCF Data Service using Linq to Entities but I can't seem to have a nested query that references the outside query like this. Can anyone help?
Possibly not as clean and efficient as the hand written version but here's what I came up with
var q = from a in db.MyEntities
where a.Timestamp == (from b in db.MyEntities
where b.Group == a.Group
select b.Timestamp).Max()
select a;
which translates into this SQL
SELECT
[Project1].[Id] AS [Id],
[Project1].[Group] AS [Group],
[Project1].[Timestamp] AS [Timestamp]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Group] AS [Group],
[Extent1].[Timestamp] AS [Timestamp],
[SSQTAB1].[A1] AS [C1]
FROM [MyEntities] AS [Extent1]
OUTER APPLY
(SELECT
MAX([Extent2].[Timestamp]) AS [A1]
FROM [MyEntities] AS [Extent2]
WHERE [Extent2].[Group] = [Extent1].[Group]) AS [SSQTAB1]
) AS [Project1]
WHERE [Project1].[Timestamp] = [Project1].[C1]
Hi try to use linqer that will convert your sql statements to linq query.
Linqer
Best Regards
This should work:
var query = db.my_table
.GroupBy(p=>p.group)
.Select(p=>p.OrderByDescending(q=>q.timestamp).First());
Here you go.A simple way to do.
var result = (from x in my_table
group x by x.Group into g
select new
{
g.Key,
timestamp = g.Max(x => x.TimeStamp),
g //This will return everything in g
});

Categories