EF query using exists throws outofmemory exception - c#

Im trying to get a list of Shop_Orders where their order_num value does not exist in a table called WarhouseOrder. This table contains 500K records and there is an index on OrderNo. List<Shop_Order> contains approx 150 items, each of which has an order_num. When this code is executed, it seems very inefficient, slow and results in an outofmemory exception. Is there a better way to do this?
List<Shop_Order> new_orders = (from a in osource.order
where !ctx.WarehouseOrders.ToList()
.Exists(o => o.OrderNo == a.order_num) select a).ToList();

WarehouseOrders.ToList() downloads all warehouse orders into memory. You can at least avoid that if you'll use Queryable.Any to check condition on database side:
!ctx.WarehouseOrders.Any(o => o.OrderNo == a.order_num)
But that will do database query for each shop order. I assume that you can get required shop orders with single database query. E.g. you can do group join shop orders with warehouse orders and select only those which don't have matches. Something like:
from so in ctx.ShopOrders
join wo in ctx.WarehouseOrders on so.order_num equals wo.OrderNo into g
where !g.Any()
select so

Related

NHibernate linq fetch query across multiple joins with subquery causes incorrect sql

I have four tables in a sql server databases:
Part
-----
Id (PK)
LineId (FK)
other fields...
Line
-----
Id (PK)
ProcessId (FK)
other fields...
Process
-----
Id (PK)
ProcessTypeId (FK)
other fields...
ProcessType
-----
Id (PK)
other fields...
I am trying to use a linq query with fetch to hydrate these entities then map the result to a view model dto.
I am using two queries, one is on Part, I am applying a filter on it to narrow down the result:
var partids = s.Query<Part>()
.Where(p => p.Line.Process.ProcessType.Id == processTypeId)
.Select(p => p.Id);
I then use this query to eager load the related entities and use the first query as a subquery:
var q = s.Query<Part>()
.Fetch(p => p.Line)
.ThenFetch(l => l.Process)
.ThenFetch(pr => pr.ProcessType)
.Where(p => partids.Contains(p.Id))
.ToList();
Though this query works, I noticed that was taking a very long time to load. So, using a profiler, I looked at the generated SQL which was:
SELECT part0_.Id AS Id0_0_,
line1_.Id AS Id1_1_,
process2_.Id AS Id2_2_,
process3_.Id AS Id3_3_,
part0_.Name AS Name0_0_,
part0_.LineId AS Line3_0_0_,
line1_.Name AS Name1_1_,
line1_.ProcessId AS Proccess3_1_1_,
process2_.Name AS Name2_2_,
process2_.ProcessTypeId AS Proccess3_2_2_,
process3_.Name AS Name3_3_
FROM part part0_
LEFT OUTER JOIN Line line1_ ON part0_.LineId=line1_.Id
LEFT OUTER JOIN Process process2_ ON line1_.ProcessId=process2_.Id
LEFT OUTER JOIN ProcessType process3_ ON process2_.ProcessTypeId=process3_.Id
WHERE part0_.Id IN
( SELECT part4_.Id
FROM Part Part4_
INNER JOIN Line Line5_ ON Part4_.LineId=Line5_.Id
WHERE process2_.ProcessTypeId= 126 );
Th subquery joining back onto the main query is causing this to run extremely slow in most cases.
I would have expected the generated SQL to be this:
SELECT part0_.Id AS Id0_0_,
line1_.Id AS Id1_1_,
process2_.Id AS Id2_2_,
process3_.Id AS Id3_3_,
part0_.Name AS Name0_0_,
part0_.LineId AS Line3_0_0_,
line1_.Name AS Name1_1_,
line1_.ProcessId AS Proccess3_1_1_,
process2_.Name AS Name2_2_,
process2_.ProcessTypeId AS Proccess3_2_2_,
process3_.Name AS Name3_3_
FROM part part0_
LEFT OUTER JOIN Line line1_ ON part0_.LineId=line1_.Id
LEFT OUTER JOIN Process process2_ ON line1_.ProcessId=process2_.Id
LEFT OUTER JOIN ProcessType process3_ ON process2_.ProcessTypeId=process3_.Id
WHERE part0_.Id IN
( SELECT part4_.Id
FROM Part part4_
INNER JOIN Line Line5_ ON part4_.LineId=Line5_.Id
INNER JOIN Process Process6_ ON Line5_.LineId=Process6_.Id
WHERE Process6_.ProcessTypeId= 126 );
I am using NHibernate 4 with the linq provider for all of my queries. Am I missing something in my linq query here?
The work around I use at the moment is to hydrate the partids query with ToList and then use the list of ids from memory. However, I would like to avoid two round trips to the database in this scenario if possible.
It is not currently feasible for me to use the QueryOver or HQL apis because all of my querying a filter code uses linq.
Please help!
Unfortunately, I think the workaround that you're currently using is your best option.
This is a bug in NHibernate, it currently affects version 4 and I'm pretty sure it was there in 3.3 too.
The bug has been reported.

linq query displaying result twice

pd is my page ;
ProductDetail pd = new ProductDetail();
fetching data and strong it in data
var data =
from product in db.Products
from orders in db.Orders
from od in db.OrderDetails
from dpt in db.Dpts
where orders.CId.Equals(
(from name in db.Companies
where name.Cname.Equals(selectedcomp)
select name.CId).FirstOrDefault())
&& od.OrdId.Equals(orders.OrdId)
&& product.PId.Equals(od.PId)
select new
{
orders.Billno ,
orders.Date,
orders.pharm ,
product.Pname,
product.Purchasedate,
product.Purchaserate,
product.Salesrate,
product.Supplier,
od.Quantity,
od.Amount
};
it displays the value of data twice in listbox.
pd.ProductDescription.ItemsSource =
(from dat in data
select dat).ToList();
I think your problem is the cross join to the Dpts table. You don't use the results from that table in the where or select, so I don't think you need it. Try removing from dpt in db.Dpts and see if that fixes your problem. My guess is that you are getting n duplicates where n is the total number of rows in db.Dpts.
#juharr is right. You don't use db.Dbts in where clause, so you cross join duplicates results. Probably in db.Dbts table you have 2 records.
I think that you should work on your query.
Few tips:
We don't use cross joins, rather we use join statement
You should get CId before executing your main query
Probably You don't have foreign keys in your database, you should consider that
When getting duplicate items from a query in LINQ or SQL, first check for bad data, then investigate it as a bad join. Your best bet is to strip it down to one table, then add the joins in one-by-one until you get your duplication. Then make that join specific enough that it stops duplicating.

Efficient way to count related entities of a many to many relation in EF

I would like to know how to efficiently count (SQL server side) the amount of distinct count of results for a specific range of a related entity that has a many to many relationship.
This is the current situation in entity Framework:
Table1 1<------->∞ Table2
Table2 ∞<------->∞ Table4
Table 2 and Table 4 have a many to many relationship and are linked with Table3 in SQL.
What I want is the distinct count of table4 results related to a specific range of Table1.
In LinQ to SQL the query is this:
(from dc in Table1
join vc in Table2 on dc.Table1Id equals vc.Table2Id
join vcac in Table3 on vc.Table2Id equals vcac.Table3Id
join ac in Table4 on vcac.Table3Id equals ac.Table4Id
where dc.Table1Id > 200000
group ac by ac.Table4Id into nieuw
select new { acid= nieuw.Key}).Count()
This lets SQL server return the count directly.
Because the extra table for the many to many relation ( Table3) is gone, I have had problems converting this query to L2E in query syntax. ( since I cannot join table 4 with table 2 with an inner join).
I have this in chained syntax, however, is this efficient ( does this fetch the whole list, or does it let SQLserver do the count, as I'm not sure this is an efficient way to select, Table 2 contains about 30.000 entries, I don't want it to fetch this result just to count it):
context.Table4.Where(a => a.Table2.Any(v => v.Table1Id > 200000)).Select(a => aTable4Id).Distinct().Count();
How would I go converting the Linq2SQL query into L2E in the query syntax ? Or is the chained syntax fine in this situation ?
The .Select() method uses deferred execution, meaning it won't actually run the query until you need the results. At that point in the chain it still exists only as a query definition. Then you modify with .Distinct() before getting .Count() - which does query the database using a SQL GROUP BY statement. So you should be good.

Linq to Entities many to many selection: How to force the generation of a JOIN instead of a subselect clause?

Using EF DB first I have two entities (Supplier, Product) that have a many-to-many relationship. Entity Framework does not create an entity for the associated table (SupplierProduct) as the associated table contains only the primary keys of the strong entities.
I have been getting all Suppliers that do not supply a given product with the following query:
var q1 = context.Suppliers.Where(s=>!s.Products.Any(p=>p.Id == 1));
The SQL produced uses an EXISTS dependent subquery similar to this:
SELECT *
FROM Suppliers s
WHERE NOT EXISTS
(SELECT 1
FROM SupplierProduct sp WHERE sp.SupplierId = s.Id && sp.ProductId = 1)
Is it possible, using Linq to Entities method syntax, to produce a query that uses joins on the associated table instead?
ie:
SELECT DISTINCT s.*
FROM SupplierProduct sp
JOIN Supplier s ON s.Id = sp.SupplierId;
WHERE sp.ProductId != 1
Update
As pointed out by JoeEnos my queries above don't do the same thing. The NOT EXISTS subquery is probably the best way to go here. What if I was trying to get all suppliers who did supply a product? I would change my linq to entities query slightly to:
var q1 = context.Suppliers.Where(s => s.Products.Any(p=>p.Id == 1));
And the SQL generated would be:
SELECT *
FROM Suppliers s
WHERE EXISTS
(SELECT 1
FROM SupplierProduct sp WHERE sp.SupplierId = s.Id && sp.ProductId = 1)
Which is fine, I get the result I want. However if I was writing SQL in this case I would normally do:
SELECT s.*
FROM SupplierProduct sp
JOIN Supplier s ON s.Id = sp.SupplierId;
WHERE sp.ProductId = 1
Can my linq to entities query be changed to produce the above SQL?
To generate SQL where a join is used instead of EXISTS when selecting an entity based on its m:n association with another entity SelectMany() can be used. Eg:
var q1 = context.Suppliers.Where(s => s.Products.Any(p=>p.Id == 1));
Can be rewritten to:
var q1 = context.Products.Where(p => p.Id == 1).SelectMany(p => p.Suppliers);
Your two queries do very different things. Your Any/EXISTS query gets suppliers who do not have product 1 at all. Your JOIN query gets all suppliers who have any products other than 1, regardless of whether or not they also have product 1.
I don't think you can do what you're looking for with just a JOIN and WHERE - you can do it with an IN clause, but I think the EXISTS query is the most correct way of looking for your data.
In any case, one of the wonderful things about Entity Framework is that you don't have to worry about what gets generated - as long as the LINQ statement is ok, then it will find the best way of writing the query, and you should never have to look at it. That's especially true when you do paging and other things like that, where the LINQ is simple, but the generated SQL is horribly ugly.

Mapping from relational database

For one to one relationships things are easy.
When it comes to one to many or many to many problems appear...
I am not using an ORM tool now for many reasons and i am wondering when i want to get data whether it is better to reassemble one to many relationship using multiple queries or in code..
For example.. Having a class Category and a class Product...
A product table has a collumn for category id (one category many products).
So for my full catalog is it better to execute 2 queries to get the categories and products i want and then populate for each category its products List ? (It is very easy with LINQ) ..
Or i should call query for each category ? Like select id from products where category_id=5;
Also i dont know how to name the functions like to set whether i want to fetch the other side of the relationship or not..
You should always use the least number of queries possible to retrieve your data. Executing one query per category to load the products is known as the N+1 problem, and can quickly cause a bottleneck in your code.
As far as what to name your methods that specify your fetch plans, name them after what the method actually does, such as IncludeProducts or WithProducts.
If you want to retrieve all categories and all their products, you can either select all categories and then select all products in two queries, or you can select in one query and group.
To use just one query, select an inner join for your two tables
SELECT c.*, p.*
FROM Category c INNER JOIN Product p ON c.CategoryId = p.CategoryId
and then construct business objects from the resulting dataset
result.GroupBy(r => r.CategoryId).Select(group =>
new Category(/* new Category using c.* columns */)
{
Products = /* new list of Products from p.* values */
});
But I have to ask - why aren't you using an ORM?

Categories