I have a database with the following schema:
Now, I'm trying to pull all landingpages for a domain and sort those by the first UrlFilter's FilterType that matches a certain group. This is the LINQ I've come up with so far:
var baseQuery = DbSet.AsNoTracking()
.Where(e => EF.Functions.Contains(EF.Property<string>(e, "Url"), $"\"{searchTerm}*\""))
.Where(e => e.DomainLandingPages.Select(lp => lp.DomainId).Contains(domainId));
var count = baseQuery.Count();
var page = baseQuery
.Select(e => new
{
LandingPage = e,
UrlFilter = e.LandingPageUrlFilters.FirstOrDefault(f => f.UrlFilter.GroupId == groupId)
})
.Select(e => new
{
e.LandingPage,
FilterType = e.UrlFilter == null ? UrlFilterType.NotCovered : e.UrlFilter.UrlFilter.UrlFilterType
})
.OrderBy(e => e.FilterType)
.Skip(10).Take(75).ToList();
Now, while this technically works, it's quite slow with execution times ranging from 10-30 seconds, which is not good enough for the use case. The LINQ is translated to the following SQL:
SELECT [l1].[Id], [l1].[LastUpdated], [l1].[Url], CASE
WHEN (
SELECT TOP(1) [l].[LandingPageId]
FROM [LandingPageUrlFilters] AS [l]
INNER JOIN [UrlFilters] AS [u] ON [l].[UrlFilterId] = [u].[Id]
WHERE ([l1].[Id] = [l].[LandingPageId]) AND ([u].[GroupId] = #__groupId_3)) IS NULL THEN 4
ELSE (
SELECT TOP(1) [u0].[UrlFilterType]
FROM [LandingPageUrlFilters] AS [l0]
INNER JOIN [UrlFilters] AS [u0] ON [l0].[UrlFilterId] = [u0].[Id]
WHERE ([l1].[Id] = [l0].[LandingPageId]) AND ([u0].[GroupId] = #__groupId_3))
END AS [FilterType]
FROM [LandingPages] AS [l1]
WHERE CONTAINS([l1].[Url], #__Format_1) AND #__domainId_2 IN (
SELECT [d].[DomainId]
FROM [DomainLandingPages] AS [d]
WHERE [l1].[Id] = [d].[LandingPageId]
)
ORDER BY CASE
WHEN (
SELECT TOP(1) [l2].[LandingPageId]
FROM [LandingPageUrlFilters] AS [l2]
INNER JOIN [UrlFilters] AS [u1] ON [l2].[UrlFilterId] = [u1].[Id]
WHERE ([l1].[Id] = [l2].[LandingPageId]) AND ([u1].[GroupId] = #__groupId_3)) IS NULL THEN 4
ELSE (
SELECT TOP(1) [u2].[UrlFilterType]
FROM [LandingPageUrlFilters] AS [l3]
INNER JOIN [UrlFilters] AS [u2] ON [l3].[UrlFilterId] = [u2].[Id]
WHERE ([l1].[Id] = [l3].[LandingPageId]) AND ([u2].[GroupId] = #__groupId_3))
END
OFFSET #__p_4 ROWS FETCH NEXT #__p_5 ROWS ONLY
Now my question is, how can I improve the execution time of this? Either by SQL or LINQ
EDIT: So I've been tinkering with some raw SQL and this is what I've come up with:
with matched_urls as (
select l.id, min(f.urlfiltertype) as Filter
from landingpages l
join landingpageurlfilters lpf on lpf.landingpageid = l.id
join urlfilters f on lpf.urlfilterid = f.id
where f.groupid = #groupId
and contains(Url, '"barz*"')
group by l.id
) select l.id, 5 as Filter
from landingpages l
where #domainId in (
select domainid
from domainlandingpages dlp
where l.id = dlp.landingpageid
) and l.id not in (select id from matched_urls ) and contains(Url, '"barz*"')
union select * from matched_urls
order by Filter
offset 10 rows fetch next 30 rows only
This performs somewhat okay, cutting the execution time down to ~5 seconds. As this is to be used for a table search I would however like to get it down even further. Is there any way to improve this SQL?
You're right to have a look at the generated SQL. In general, I would advise to learn SQL, write a performing SQL query and work your way back (either use a stored procedure or raw SQL, or design your LINQ query with that same philosophy.
I suspect this will be better (not tested):
var page = (
from e in baseQuery
let urlFilter = e.LandingPageUrlFilters.OrderBy(f => f.UrlFilterType).FirstOrDefault(f => f.UrlFilter.GroupId == groupId)
let filterType = urlFilter == null ? UrlFilterType.NotCovered : e.UrlFilter.UrlFilter.UrlFilterType
select new
{
LandingPage = e,
FilterType = filterType
}
).Skip(10).Take(75).ToList();
one of the way to improve the execution time is see execution plan in SSMS (SQL Server Management Studio).
After look on the execution plan you can design some indexes, or if you have no experiences with this, you can see if SSMS recommends some indexes.
Next try to create the indexes and execute the query again and see if execution time was improved.
Note: this is only one of many possible ways to improve execution time...
Related
I have a following query which runs very fast:
var query =
(from art in ctx.Articles
join phot in ctx.ArticlePhotos on art.Id equals phot.ArticleId
join artCat in ctx.ArticleCategories on art.Id equals artCat.ArticleId
join cat in ctx.Categories on artCat.CategoryId equals cat.Id
where art.Active && art.ArticleCategories.Any(c => c.Category.MaterializedPath.StartsWith(categoryPath))
orderby art.PublishDate descending
select new ArticleSmallResponse
{
Id = art.Id,
Title = art.Title,
Active = art.Active,
PublishDate = art.PublishDate ?? art.CreateDate,
MainImage = phot.RelativePath,
RootCategory = art.Category.Name,
Summary = art.Summary
})
.AsNoTracking().Take(request.Take);
However, if I add group by and change query to following statement, it runs much much slower.
var query =
(from art in ctx.Articles
join phot in ctx.ArticlePhotos on art.Id equals phot.ArticleId
join artCat in ctx.ArticleCategories on art.Id equals artCat.ArticleId
join cat in ctx.Categories on artCat.CategoryId equals cat.Id
where art.Active && art.ArticleCategories.Any(c => c.Category.MaterializedPath.StartsWith(categoryPath))
orderby art.PublishDate descending
select new ArticleSmallResponse
{
Id = art.Id,
Title = art.Title,
Active = art.Active,
PublishDate = art.PublishDate ?? art.CreateDate,
MainImage = phot.RelativePath,
RootCategory = art.Category.Name,
Summary = art.Summary
})
.GroupBy(m => m.Id)
.Select(m => m.FirstOrDefault())
.AsNoTracking().Take(request.Take);
Homepage calls query 9 times for each category. With the first version of query, without caching turned on and connecting to SQL remotely, page load is around 1.5 seconds, which makes it almost instant when application is on server, but second way makes homepage load around 39 seconds when SQL is remotely.
Can it be fixed without rewriting the entire query in to the view or stored procedure?
Grouping is an expensive operation on the database end. Without knowing what your database looks like and what indexes you've setup, it will be difficult to determine. Why not just group on the client side after the data has arrived (assuming its not an overwhelming amount).
This question explains how.
Group by in LINQ
My ASP.Net application has the following Linq to SQL function to get a distinct list of height values from the product table.
public static List<string> getHeightList(string catID)
{
using (CategoriesClassesDataContext db = new CategoriesClassesDataContext())
{
var heightTable = (from p in db.Products
join cp in db.CatProducts on p.ProductID equals cp.ProductID
where p.Enabled == true && (p.CaseOnly == null || p.CaseOnly == false) && cp.CatID == catID
select new { Height = p.Height, sort = Convert.ToDecimal(p.Height.Replace("\"", "")) }).Distinct().OrderBy(s => s.sort);
List<string> heightList = new List<string>();
foreach (var s in heightTable)
{
heightList.Add(s.Height.ToString());
}
return heightList;
}
}
I ran Redgate SQL Monitor which shows that this query is using a lot of resources.
Redgate is also showing that I am running the following query:
select count(distinct [height]) from product p
join catproduct cp on p.productid = cp.productid
join cat c on cp.catid = c.catid
where p.enabled=1 and p.displayfilter = 1 and c.catid = 'C2-14'
My questions are:
A suggestion to change the function so that it uses less resources?
Also, how does linq to sql generate the above query from my function? (I did not write select count(distinct [height]) from product anywhere in the code)
There are 90,000 records in the products. This category which I am trying to get the distinct list of heights has 50,000 product records
Thank you in advance,
Nick
First of all your posted sql query and linq query doesn't match at all. it's not the LINQ query rather the underlying SQL query itself performing slow. Make sure, all the columns involved in JOIN ON clause and WHERE clause and ORDER BY clause are indexed properly in order to have a better execution plan; else you will end up getting a FULL Table Scan and a File Sort and query will deemed to perform slow.
The join multiplies the number of Products the query returns. To undo that, you apply Distinct at the end. It will certainly reduce db resources if you return unique Products right away:
var heightTable = (from p in db.Products
where p.CatProducts.Any(cp => cp.CatID == catID)
&& p.Enabled && (p.CaseOnly == null || !p.CaseOnly)
select new
{
Height = p.Height,
sort = Convert.ToDecimal(p.Height.Replace("\"", ""))
}).OrderBy(s => s.sort);
This changes the join into a where clause. It saves the db engine the trouble of deduplicating the result.
If that still performs poorly, you should try to do the conversion and ordering in memory, i.e. after receiving the raw results from the database.
As for the count. I don't know where it comes from. Such queries typically get generated by paging libraries such as PagedList, but I see no trace of that in your code.
Side note: you can return ...
heightList.Select(x => x.Height.ToString()).ToList()
... instead of creating the list yourself.
I've created a code-first database and the question is that I'm having some difficulty transcribing this SQL statement into the C# code.
Below there's the SQL statement that I require help on adapting and the tables that I currently use. The objective of this SQL Query is that on the table TableViewedMessageLog is a record of which user saw which message and the desired effect is to select all messages Non-Read (which information is stored on this table -- TableViewMessageLogs) to an certain user.
http://gyazo.com/0105c0959bdd2930272bf5c07a112a11
select * from TableMessages tm
where tm.Id not in (select tv.Message_Id from TableViewedMessageLogs as tv
where tv.User_Email = 'asd#asd')
Try this query:
var data = from f in context.TableMessages
where f.id !=
(
from fb in TableViewedMessageLogs
where User_Email == 'asd#asd'
select fb.Message_Id
)
select f;
Try this:
var data = (from e in context.TableMessages
where context.TableViewedMessageLogs
.Where(x => x.User_Email == 'asd#asd')
.Select(x => x.Message_Id).Contains(e.Id) == false
select e)
.ToList();
You can do it like this to prevent a sub query, providing better performance and makes it a lot easier to understand what's happening:
var viewedLogs = context.TableViewedMessageLogs.Where(w => w.User_Email = 'asd#asd');
var result = context.TableMessages.Where(w => !viewedLogs.Contains(w.Id));
For example, I have a table:
Date |Value
----------|-----
2015/10/01|5
2015/09/01|8
2015/08/01|10
Is there any way using Linq-to-SQL to get a new sequence which will be an arithmetic operation between consecutive elements in the previously ordered set (for example, i.Value - (i-1).Value)? It must be executed on SQL Server 2008 side, not application side.
For example dataContext.GetTable<X>().OrderByDescending(d => d.Date).Something(.......).ToArray(); should return 3, 2.
Is it possible?
You can try this:
var q = (
from i in Items
orderby i.ItemDate descending
let prev = Items.Where(x => x.ItemDate < i.ItemDate).FirstOrDefault()
select new { Value = i.ItemValue - (prev == null ? 0 : prev.ItemValue) }
).ToArray();
EDIT:
If you slightly modify the above linq query to:
var q = (from i in Items
orderby i.ItemDate descending
let prev = Items.Where(x => x.ItemDate < i.ItemDate).FirstOrDefault()
select new { Value = (int?)i.ItemValue - prev.ItemValue }
).ToArray();
then you get the following TSQL query sent to the database:
SELECT ([t0].[ItemValue]) - ((SELECT [t2].[ItemValue]
FROM (SELECT TOP (1) [t1].[ItemValue]
FROM [Items] AS [t1]
WHERE [t1].[ItemDate] < [t0].[ItemDate]) AS [t2]
)) AS [Value]
FROM [Items] AS [t0]
ORDER BY [t0].[ItemDate] DESC
My guess now is if you place an index on ItemDate field this shouldn't perform too bad.
I wouldn't let SQL do this, it would create an inefficient SQL query (I think).
I could create a stored procedure, but if the amount of data is not too big I can also use Linq to objects:
List<x> items=dataContext.GetTable<X>().OrderByDescending(d => d.Date).ToList();//Bring data to memory
var res = items.Skip(1).Zip(items, (cur, prev) => cur.Value - prev.Value);
At the end, I might use a foreach for readability
I'm trying to rewrite a SQL query in LINQ to Entities. I'm using LINQPad with a typed datacontext from my own assembly to test things out.
The SQL query I'm trying to rewrite:
SELECT DISTINCT variantID AS setID, option_value AS name, option_value_description AS description, sort_order as sortOrder
FROM all_products_option_names AS lst
WHERE lst.optionID=14 AND lst.productID IN (SELECT productID FROM all_products_option_names
WHERE optionID=7 AND option_value IN (SELECT name FROM brands
WHERE brandID=1))
ORDER BY sortOrder;
The LINQ to Entities query I've come up with so far (which doesn't work due to a timeout error):
from a in all_products_option_names
where a.optionID == 14 && all_products_option_names.Any(x => x.productID == a.productID && x.optionID == 7 && brands.Any(y => y.name == x.option_value && y.brandID == 1))
select new
{
id = a.variantID,
name = a.option_value,
description = a.option_value_description,
sortOrder = a.sort_order,
}
This is the error I get when I run the above query: An error occurred while executing the command definition. See the inner exception for details.
And the inner exception is: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
Edit:
I use MySQL and probably that's why LINQPad doesn't show me the generated SQL.
The SQL version doesn't time out.
Edit 2:
I solved the problem by completely changing the query, so this question is irrelevant now.
I marked Steven's response as the correct one, because he was closest to what i was trying to achieve and his response gave me the idea which led me to the solution.
Try this:
var brandNames =
from brand in db.Brands
where brand.ID == 1
select name;
var brandProductNames =
from p in db.all_products_option_names
where p.optionID == 7
where brandNames.Contains(p.option_value)
select p.productId;
var results =
from p in db.all_products_option_names
where p.optionID == 14
where brandProductNames.Contains(p.productId)
select new
{
setID = p.variantID,
name = p.option_value,
description = p.option_value_description,
sortOrder = p.sort_order
};
I would recommend doing joins rather than sub-select's as you have them. Sub-selects are not very efficient when you look at performance, it's like having loops inside of loops when you code , not a good idea. This could actually cause that timeout your getting if your database is running slowly even thou that looks like a simple query.
I would try using joins with a distinct at the end like this:
var results =
(from p in db.all_products_option_names
join p2 in db.all_products_option_names on p.productId equals p2.productId
join b in db.Brands on p2.option_value equals b.name
where p.optionID == 14
where p2.optionID == 7
where b.BrandID == 1
select new
{
setID = p.variantID,
name = p.option_value,
description = p.option_value_description,
sortOrder = p.sort_order
}).Distinct();
Or you could try using joins with the into and with an any like so
var results =
from p in db.all_products_option_names
join p2 in (from p3 in db.all_products_option_names.Where(x => x.optionId == 7)
join b in db.Brands.Where(x => x.BrandID == 1) on p3.option_value equals b.name
select p3) into pg
where p.optionID == 14
where pg.Any()
select new
{
setID = p.variantID,
name = p.option_value,
description = p.option_value_description,
sortOrder = p.sort_order
};