Linq to SQL Slow Query - c#

My ASP.Net application has the following Linq to SQL function to get a distinct list of height values from the product table.
public static List<string> getHeightList(string catID)
{
using (CategoriesClassesDataContext db = new CategoriesClassesDataContext())
{
var heightTable = (from p in db.Products
join cp in db.CatProducts on p.ProductID equals cp.ProductID
where p.Enabled == true && (p.CaseOnly == null || p.CaseOnly == false) && cp.CatID == catID
select new { Height = p.Height, sort = Convert.ToDecimal(p.Height.Replace("\"", "")) }).Distinct().OrderBy(s => s.sort);
List<string> heightList = new List<string>();
foreach (var s in heightTable)
{
heightList.Add(s.Height.ToString());
}
return heightList;
}
}
I ran Redgate SQL Monitor which shows that this query is using a lot of resources.
Redgate is also showing that I am running the following query:
select count(distinct [height]) from product p
join catproduct cp on p.productid = cp.productid
join cat c on cp.catid = c.catid
where p.enabled=1 and p.displayfilter = 1 and c.catid = 'C2-14'
My questions are:
A suggestion to change the function so that it uses less resources?
Also, how does linq to sql generate the above query from my function? (I did not write select count(distinct [height]) from product anywhere in the code)
There are 90,000 records in the products. This category which I am trying to get the distinct list of heights has 50,000 product records
Thank you in advance,
Nick

First of all your posted sql query and linq query doesn't match at all. it's not the LINQ query rather the underlying SQL query itself performing slow. Make sure, all the columns involved in JOIN ON clause and WHERE clause and ORDER BY clause are indexed properly in order to have a better execution plan; else you will end up getting a FULL Table Scan and a File Sort and query will deemed to perform slow.

The join multiplies the number of Products the query returns. To undo that, you apply Distinct at the end. It will certainly reduce db resources if you return unique Products right away:
var heightTable = (from p in db.Products
where p.CatProducts.Any(cp => cp.CatID == catID)
&& p.Enabled && (p.CaseOnly == null || !p.CaseOnly)
select new
{
Height = p.Height,
sort = Convert.ToDecimal(p.Height.Replace("\"", ""))
}).OrderBy(s => s.sort);
This changes the join into a where clause. It saves the db engine the trouble of deduplicating the result.
If that still performs poorly, you should try to do the conversion and ordering in memory, i.e. after receiving the raw results from the database.
As for the count. I don't know where it comes from. Such queries typically get generated by paging libraries such as PagedList, but I see no trace of that in your code.
Side note: you can return ...
heightList.Select(x => x.Height.ToString()).ToList()
... instead of creating the list yourself.

Related

How to join a list and large lists/tables using LINQ

Initially I have such a list :
List<Car> cars = db.Car.Where(x => x.ProductionYear == 2005).ToList();
Then I'm trying to join this list with two large tables using LINQ like this :
var joinedList = (from car in cars
join driver in db.Driver.ToList()
on car.Id equals driver.CarId
join building in db.Building.ToList()
on driver.BuildingId equals building.Id
select new Building
{
Name = building.Name;
Id = building.Id;
City = building.City;
}).ToList();
Both Driver and Building tables have about 1 million rows. When I run this join I get out of memory exception. How can I make this join work? Should I make the join operation on database? If yes, how can I carry cars list to the db? Thanks in advance.
Even if you remove the .ToList() calls inside your join, you code will still pull all the data and perform the join in-memory and not in SQL server. This is because you're using a local list cars in your join. The below should solve your problem:
var joinedList = (from car in db.Car.Where(x => x.ProductionYear == 2005)
join driver in db.Driver
on car.Id equals driver.CarId
join building in db.Building
on driver.BuildingId equals building.Id
select new Building
{
Name = building.Name;
Id = building.Id;
City = building.City;
}).ToList();
You can remove the last .ToList() and do some paging if you expect to get too many records in the results.
even If You have removed .ToList() replace in .AsQueryable()
AsQueryable Faster then ToList And AsEnumerable
If you create an IQueryable, then the query may be converted to sql
and run on the database server
If you create an IEnumerable, then all rows will be pulled into
memory as objects before running the query.
In both cases if you don't call a ToList() or ToArray() then query
will be executed each time it is used, so, say, you have an
IQueryable and you fill 4 list boxes from it, then the query will be
run against the database 4 times.
so following Used Linq query
var joinedList = (from car in db.Car.Where(x => x.ProductionYear == 2005).AsQueryable()
join driver in db.Driver.AsQueryable()
on car.Id equals driver.CarId
join building in db.Building.AsQueryable()
on driver.BuildingId equals building.Id
select new Building
{
Name = building.Name,
Id = building.Id,
City = building.City,
}).ToList();
First don't ever try ToList() while using LINQ(you can) but make sure that you use ToList() as less as possible in a very rare scenarios only.
Every time you will get OutOfMemoryException when the table contains many rows.
So, here is the code for your question:
var joinedList = (from car in db.Car.GetQueryable().Where(x => x.ProductionYear == 2005)
join driver in db.Driver.GetQueryable() on car.Id equals driver.CarId
join building in db.Building.GetQueryable() on driver.BuildingId equals building.Id
select new Building
{
Name = building.Name;
Id = building.Id;
City = building.City;
}).ToList();

Linq NOT IN query - based on SQL query

I'm trying to figure out how I can convert this same SQL query into a Linq query, but I'm not seeing a way to do NOT IN with Linq like you can with SQL.
SELECT COUNT(DISTINCT ID)
FROM References
WHERE ID NOT IN (
SELECT DISTINCT ID
FROM References
WHERE STATUS = 'COMPLETED')
AND STATUS = 'FAILED'
I need to know how many distinct [ID] values exist that contain a [Status] value of "FAILED" that do not also have a [Status] of "COMPLETED". Basically, if there is a failed without a completed, i need the distinct amount for that.
var query_5 = from r in Records where r.ID NOT IN(from r in Records where
r.Record_Status == "COMPLETED" ) && (r.Record_Status == "FAILED")
select r.ID;
var rec_5 = query_5;
Console.WriteLine(rec_5.Distinct());
This was my attempt to do it, but I'm receiving numerous errors as it is not the right way to code it. Any examples on how to accomplish this would be much appreciated!
This is how the rest of my setup is looking.
public class References
{
public string ID;
public string Record_Status;
}
public static List<References> Records = new List<References>
{
};
The rough equivalent of a (not) in is using Contains(). Since the inner subquery doesn't reference the outer, you could write it like this:
var completedIds =
(from r in ctx.References
where r.Status == "COMPLETED"
select r.Id).Distinct();
var count =
(from r in ctx.References
where !completedIds.Contains(r.ID)
where r.Status == "FAILED"
select r.Id).Distinct().Count();
You could use the Except method:
var completed =
(from r in References
where r.Record_Status == "COMPLETED"
select r.Id).Distinct();
var failed =
(from r in References
where r.Record_Status == "FAILED"
select r.Id).Distinct();
var countFailedNotCompleted = failed.Except(completed).Count();
Note that this does not require using Contains during iteration. The sequences will be compared all at once within the Except method. You could also tack ToArray() on to each of the distinct sequences to ensure minimal iteration in the case where you want to use those sequences more than once.

Use Array in Linq query

My question got down voted and put on hold because it is not specific enough. Ill try to specify
Before linq I would do this query
sql="SELECT products.* FROM products INNER JOIN productaccess ON products.id=productaccess.productid"
Now with the entity framework and link I can do this
var products = (from lProducts in db.Products
join lProductAccess in db.ProductAccess on lProducts.ID equals lProductAccess.ProductID
select lProducts).ToList();
But what if I want the flexibilty to get all products or only get the accessible objects
In sql I can do this
sql="SELECT products.* FROM products "
if (useProductAccess) {
sql+=" INNER JOIN productaccess ON products.id=productaccess.productid"
}
In Linq I have to make a separate linq statement.
if (useProductAccess) {
var productsFiltered = (from lProducts in db.Products
join lProductAccess in db.ProductAccess on lProducts.ID equals lProductAccess.ProductID
select lProducts).ToList();
} else {
var productsAll = (from lProducts in db.Products select lProducts).ToList();
}
Now, I could just get all the lProducts and then filter it in an additional linq statement with lProductAccess but then I am using an unnecessary large amount of data.
Is it an option to use:
var productsAccecible = (from lProductAccess in db.ProductAccess where lProductAccess.CustID==custID select lProductAccess).toArray();
var products = (from lProducts in db.Products
where (useProductAccess ?
productsAccessible.Contains(lProducts.ID)
: true)
select lProducts).ToList();
Linq provider will not know how to transform the ternary operator (? and :) in a valid sql, you could try this:
var query = db.Products;
if (useProductAccess)
query = query.Where(p => productsAccessible.Contains(p.ID));
var result = query.ToList();
I used the express profiler to see how the linq statement is translated into sql. It shows that the
productsAccessible.Contains(lProducts.ID)
part gets translated as
products.id in (comma seperated list of values)
My conclusion is it will work fine.
Are there possible drawbacks
Sure - it may produce an inefficient query, or it may not even work.
One thing to note is that your conditional operator won't compile; you can't return a bool and an int from the ternary operator.
Maybe you mean:
var products = (from lProducts in db.Products
where (useProductAccess ?
productsAccessible.Contains(lProducts.ID)
: true)
select lProducts).ToList();
or build your query up using method syntax and only add the where clause if necessary.

Using DISTINCT on a subquery to remove duplicates in Entity Framework

I have question about use of Distinct with Entity Framework, using Sql 2005. In this example:
practitioners = from p in context.Practitioners
join pn in context.ProviderNetworks on
p.ProviderId equals pn.ProviderId
(notNetworkIds.Contains(pn.Network))
select p;
practitioners = practitioners
.Distinct()
.OrderByDescending(p => p.UpdateDate);
data = practitioners.Skip(PageSize * (pageOffset ?? 0)).Take(PageSize).ToList();
It all works fine, but the use of distinct is very inefficient. Larger result sets incur unacceptable performance. The DISTINCT is killing me. The distinct is only needed because multiple networks can be queried, causing Providers records to be duplicated. In effect I need to ask the DB "only return providers ONCE even if they're in multiple networks". If I could place the DISTINCT on the ProviderNetworks, the query runs much faster.
How can I cause EF to add the DISTINCT only the subquery, not to the entire resultset?
The resulting simplified sql I DON'T want is:
select DISTINCT p.* from Providers
inner join Networks pn on p.ProviderId = pn.ProviderId
where NetworkName in ('abc','def')
IDEAL sql is:
select p.* from Providers
inner join (select DISTINCT ProviderId from Networks
where NetworkName in ('abc','def'))
as pn on p.ProviderId = pn.ProviderId
Thanks
Dave
I dont think you need a Distinct here but a Exists (or Any as it is called in Linq)
Try this:
var q = (from p in context.Practitioners
where context.ProviderNetworks.Any(pn => pn.ProviderId == p.ProviderId && notNetworkIds.Contains(pn.Network))
orderby p.UpdateDate descending
select p).Skip(PageSize * (pageOffset ?? 0)).Take(PageSize).ToList();

Is this LINQ Query "correct"?

I have the following LINQ query, that is returning the results that I expect, but it does not "feel" right.
Basically it is a left join. I need ALL records from the UserProfile table.
Then the LastWinnerDate is a single record from the winner table (possible multiple records) indicating the DateTime the last record was entered in that table for the user.
WinnerCount is the number of records for the user in the winner table (possible multiple records).
Video1 is basically a bool indicating there is, or is not a record for the user in the winner table matching on a third table Objective (should be 1 or 0 rows).
Quiz1 is same as Video 1 matching another record from Objective Table (should be 1 or 0 rows).
Video and Quiz is repeated 12 times because it is for a report to be displayed to a user listing all user records and indicate if they have met the objectives.
var objectiveIds = new List<int>();
objectiveIds.AddRange(GetObjectiveIds(objectiveName, false));
var q =
from up in MetaData.UserProfile
select new RankingDTO
{
UserId = up.UserID,
FirstName = up.FirstName,
LastName = up.LastName,
LastWinnerDate = (
from winner in MetaData.Winner
where objectiveIds.Contains(winner.ObjectiveID)
where winner.Active
where winner.UserID == up.UserID
orderby winner.CreatedOn descending
select winner.CreatedOn).First(),
WinnerCount = (
from winner in MetaData.Winner
where objectiveIds.Contains(winner.ObjectiveID)
where winner.Active
where winner.UserID == up.UserID
orderby winner.CreatedOn descending
select winner).Count(),
Video1 = (
from winner in MetaData.Winner
join o in MetaData.Objective on winner.ObjectiveID equals o.ObjectiveID
where o.ObjectiveNm == Constants.Promotions.SecVideo1
where winner.Active
where winner.UserID == up.UserID
select winner).Count(),
Quiz1 = (
from winner2 in MetaData.Winner
join o2 in MetaData.Objective on winner2.ObjectiveID equals o2.ObjectiveID
where o2.ObjectiveNm == Constants.Promotions.SecQuiz1
where winner2.Active
where winner2.UserID == up.UserID
select winner2).Count(),
};
You're repeating join winners table part several times. In order to avoid it you can break it into several consequent Selects. So instead of having one huge select, you can make two selects with lesser code. In your example I would first of all select winner2 variable before selecting other result properties:
var q1 =
from up in MetaData.UserProfile
select new {up,
winners = from winner in MetaData.Winner
where winner.Active
where winner.UserID == up.UserID
select winner};
var q = from upWinnerPair in q1
select new RankingDTO
{
UserId = upWinnerPair.up.UserID,
FirstName = upWinnerPair.up.FirstName,
LastName = upWinnerPair.up.LastName,
LastWinnerDate = /* Here you will have more simple and less repeatable code
using winners collection from "upWinnerPair.winners"*/
The query itself is pretty simple: just a main outer query and a series of subselects to retrieve actual column data. While it's not the most efficient means of querying the data you're after (joins and using windowing functions will likely get you better performance), it's the only real way to represent that query using either the query or expression syntax (windowing functions in SQL have no mapping in LINQ or the LINQ-supporting extension methods).
Note that you aren't doing any actual outer joins (left or right) in your code; you're creating subqueries to retrieve the column data. It might be worth looking at the actual SQL being generated by your query. You don't specify which ORM you're using (which would determine how to examine it client-side) or which database you're using (which would determine how to examine it server-side).
If you're using the ADO.NET Entity Framework, you can cast your query to an ObjectQuery and call ToTraceString().
If you're using SQL Server, you can use SQL Server Profiler (assuming you have access to it) to view the SQL being executed, or you can run a trace manually to do the same thing.
To perform an outer join in LINQ query syntax, do this:
Assuming we have two sources alpha and beta, each having a common Id property, you can select from alpha and perform a left join on beta in this way:
from a in alpha
join btemp in beta on a.Id equals btemp.Id into bleft
from b in bleft.DefaultIfEmpty()
select new { IdA = a.Id, IdB = b.Id }
Admittedly, the syntax is a little oblique. Nonetheless, it works and will be translated into something like this in SQL:
select
a.Id as IdA,
b.Id as Idb
from alpha a
left join beta b on a.Id = b.Id
It looks fine to me, though I could see why the multiple sub-queries could trigger inefficiency worries in the eyes of a coder.
Take a look at what SQL is produced though (I'm guessing you're running this against a database source from your saying "table" above), before you start worrying about that. The query providers can be pretty good at producing nice efficient SQL that in turn produces a good underlying database query, and if that's happening, then happy days (it will also give you another view on being sure of the correctness).

Categories