I have used the Kendo DataSourceResult ToDataSourceResult(this IQueryable enumerable, DataSourceRequest request); extension extensively and never noticed a performance issue until now when querying a table of 40 million records.
The take 10 query I wrote as a benchmark as it is the same as the request passed in.
This is my read action:
public ActionResult ReadAll([DataSourceRequest] DataSourceRequest
{
var startTimer = DateTime.Now;
var context = Helpers.EFTools.GetCADataContext();
Debug.WriteLine(string.Format("{0} : Got Context", DateTime.Now - startTimer));
var events = from e in context.Events
select
new Models.Event()
{
Id = e.Id,
DateTime = e.EventDateTime,
HostId = e.Door.HostId,
SiteId = e.Door.Host.SiteId,
UserId = (int)e.UserId,
UserName = e.User.FirstName + " " + e.User.Surname,
DoorId = e.DoorId,
Door = e.Door.Name,
Description = e.Description,
SubDescription = e.SubDescription
};
Debug.WriteLine(string.Format("{0} : Built Query", DateTime.Now - startTimer));
var tenRecods = events.OrderByDescending(i => i.DateTime).Take(10).ToList();
Debug.WriteLine(string.Format("{0} : Taken 10", DateTime.Now - startTimer));
var result = events.ToDataSourceResult(request);
Debug.WriteLine(string.Format("{0} : Datasource Result", DateTime.Now - startTimer));
return this.Json(result);
}
The output from Debug:
00:00:00.1316569 : Got Context
00:00:00.1332584 : Built Query
00:00:00.2407656 : Taken 10
00:00:21.5013946 : Datasource Result
Although sometimes the query times out.
Using dbMonitor I captured both querys, first the manual take 10:
"Project1".id,
"Project1"."C1",
"Project1".hostid,
"Project1".siteid,
"Project1".userid,
"Project1"."C2",
"Project1".doorid,
"Project1"."name",
"Project1".description,
"Project1".subdescription
FROM ( SELECT
"Extent1".id,
"Extent1".userid,
"Extent1".description,
"Extent1".subdescription,
"Extent1".doorid,
"Extent2"."name",
"Extent2".hostid,
"Extent3".siteid,
CAST("Extent1".eventdatetime AS timestamp) AS "C1",
"Extent4".firstname || ' ' || "Extent4".surname AS "C2"
FROM public.events AS "Extent1"
INNER JOIN public.doors AS "Extent2" ON "Extent1".doorid = "Extent2".id
INNER JOIN public.hosts AS "Extent3" ON "Extent2".hostid = "Extent3".id
INNER JOIN public.users AS "Extent4" ON "Extent1".userid = "Extent4".id
) AS "Project1"
ORDER BY "Project1"."C1" DESC
LIMIT 10
And the ToDataSourceRequest query:
SELECT
"GroupBy1"."A1" AS "C1"
FROM ( SELECT Count(1) AS "A1"
FROM public.events AS "Extent1"
INNER JOIN public.doors AS "Extent2" ON "Extent1".doorid = "Extent2".id
) AS "GroupBy1"
This is the DataSourceRequest request parameter passed in:
request.Aggregates Count = 0
request.Filters Count = 0
request.Groups Count = 0
request.Page 1
request.PageSize 10
request.Sorts Count = 1
This is the result of var result = events.ToDataSourceResult(request);
result.AggregateResults null
result.Data Count = 10
result.Errors null
result.Total 43642809
How can I get a DataSourceResult from the events IQueryable using the DataSourceRequest in a more efficient and faster way?
After implementing a custom binding (suggested by Atanas Korchev) with lots of debug output time stamps, it was obvious what was causing the performance issue, the total count.
Looking at the SQL I captured backs this up, don't know why I didn't see it before.
Getting the total row count quickly is another question but will post any answers I find here.
Related
I have a stored procedure in SQL Server that gets contact persons based on multiple filters (e.g. DateOfBirth, DisplayName, ...) from multiple tables. I need to alter the stored procedure to include pagination and total count, since the pagination was done in the backend. PartyId is the unique key. The caveat is that a person can have multiple emails and phones, and let's say we search for DisplayName = "Sarah", the query will return the following :
TotalCount PartyId DisplayName EmailAddress PhoneNumber
-----------------------------------------------------------------
3 1 Sarah sarah#gmail.com 1
3 1 Sarah sarah2#gmail.com 1
3 1 Sarah sarah#gmail.com 2
This is roughly what the stored procedure does, the assigned values for CurrentPage and PageSize and the ORDER BY OFFSET on the bottom I included to test the pagination :
DECLARE #CurrentPage int = 1
DECLARE #PageSize int = 1000
SELECT
COUNT(*) OVER () as TotalCount,
p.Id AS PartyId,
e.EmailAddress,
pn.PhoneNumber
etc.....
FROM
[dbo].[Party] AS p WITH(NOLOCK)
INNER JOIN
[dbo].[Email] AS e WITH(NOLOCK) ON p.[Id] = e.[PartyID]
INNER JOIN
[dbo].[PhoneNumber] AS pn WITH(NOLOCK) ON p.[Id] = pn.[PartyID]
etc.....
WHERE
p.PartyType = 1 /*Individual*/
GROUP BY
p.Id, e.EmailAddress, pn.PhoneNumber etc...
ORDER BY
p.Id
OFFSET (#CurrentPage - 1) * #PageSize ROWS
FETCH NEXT #PageSize ROWS ONLY
This is what we do in the backend to group by PartyId and assign the corresponding emails and phones.
var responseModel = unitOfWork.PartyRepository.SearchContacts(model);
if (responseModel != null && responseModel.Count == 0)
{
return null;
}
// get multiple phones/emails for a party
var emailAddresses = responseModel.GroupBy(p => new { p.PartyId, p.EmailAddress })
.Select(x => new {
x.Key.PartyId,
x.Key.EmailAddress
});
var phoneNumbers = responseModel.GroupBy(p => new { p.PartyId, p.PhoneNumber, p.PhoneNumberCreateDate })
.Select(x => new {
x.Key.PartyId,
x.Key.PhoneNumber,
x.Key.PhoneNumberCreateDate
}).OrderByDescending(p => p.PhoneNumberCreateDate);
// group by in order to avoid multiple records with different email/phones
responseModel = responseModel.GroupBy(x => x.PartyId)
.Select(grp => grp.First())
.ToList();
var list = Mapper.Map<List<SearchContactResponseModelData>>(responseModel);
// add all phones/emails to respective party
list = list.Select(x =>
{
x.EmailAddresses = new List<string>();
x.EmailAddresses.AddRange(emailAddresses.Where(y => y.PartyId == x.PartyId).Select(y => y.EmailAddress));
x.PhoneNumbers = new List<string>();
x.PhoneNumbers.AddRange(phoneNumbers.Where(y => y.PartyId == x.PartyId).Select(y => y.PhoneNumber));
return x;
}).ToList();
var sorted = SortAndPagination(model, model.SortBy, list);
SearchContactResponseModel result = new SearchContactResponseModel()
{
Data = sorted,
TotalCount = list.Count
};
return result;
And the response will be :
{
"TotalCount": 1,
"Data": [
{
"PartyId": 1,
"DisplayName": "SARAH",
"EmailAddresses": [
"sarah#gmail.com",
"sarah2#gmail.com"
],
"PhoneNumbers": [
"1",
"2"
]
}
]
}
The TotalCount returned from the stored procedure obviously is not the real one, and after the backend code (where we assign the emails/phones and group by id) we get the real totalCount which is 1 instead of 3.
If we have 3 persons with the name Sarah, because of multiple phones/emails the totalCount in the stored procedure will be lets say 9 and the real count will be 3 and if I execute the stored procedure to get persons from 1 to 2, the pagination wont work because of the 9 records.
How can I implement pagination in the above scenario ?
You might try using a CTE to isolate the query against the Party table. This would allow you to pull the right number of rows (and the proper total row count) without having to worry about the expansion from the emails and phone numbers.
It would look something like this (rearranging your query above):
DECLARE #CurrentPage int = 1;
DECLARE #PageSize int = 1000;
WITH PartyList AS (
SELECT
COUNT(*) OVER () as TotalCount,
p.Id AS PartyId
FROM
[dbo].[Party] AS p WITH(NOLOCK)
WHERE
p.PartyType = 1 /*Individual*/
GROUP BY -- You might not need this now depending on your data
p.Id
ORDER BY
p.Id
OFFSET (#CurrentPage - 1) * #PageSize ROWS
FETCH NEXT #PageSize ROWS ONLY
)
SELECT
pl.TotalCount,
pl.PartyId,
e.EmailAddress,
pn.PhoneNumber
FROM PartyList AS pl
INNER JOIN
[dbo].[Email] AS e WITH(NOLOCK) ON pl.[PartyId] = e.[PartyID]
INNER JOIN
[dbo].[PhoneNumber] AS pn WITH(NOLOCK) ON pl.[PartyId] = pn.[PartyID];
Please be aware that the CTE will require the prior statement to end in a semicolon.
I have this Method# 1 query below that is parameterized using dapper, problem is the query times out with this approach even after waiting 30sec and normally it takes max of 1 sec on SSMS with plain sql.
However Method # 2 query actually works where the query is built on the server side instead of parameterized one. One thing i have noticed is, it might have something to do with filter for FirstName and LastName, I have single Quote on Method #2 for those filter but not for Method #1.
What is wrong with Method # 1 ?
Method # 1
string query = "SELECT *
FROM dbo.Customer c
WHERE c.MainCustomerId = #CustomerId
AND (#IgnoreCustomerId = 1 OR c.CustomerID = #FilterCustomerId)
AND (#IgnoreFirstName = 1 OR c.FirstName = #FilterFirstName)
AND (#IgnoreLastName = 1 OR c.LastName = #FilterLastName)
AND (#IgnoreMemberStatus = 1 OR c.CustomerStatusID = #FilterMemberStatus)
AND (#IgnoreMemberType = 1 OR c.CustomerTypeID = #FilterMemberType)
AND (#IgnoreRank = 1 OR c.RankID = #FilterRank)
ORDER BY c.CustomerId
OFFSET #OffSet ROWS
FETCH NEXT 50 ROWS ONLY";
_procExecutor.ExecuteSqlAsync<Report>(query, new
{
CustomerId = customerId,
IgnoreCustomerId = ignoreCustomerId,
FilterCustomerId = filter.CustomerID,
IgnoreFirstName = ignoreFirstName,
FilterFirstName = filter.FirstName,
IgnoreLastName = ignoreLastName,
FilterLastName = filter.LastName,
IgnoreMemberStatus = ignoreMemberStatus,
FilterMemberStatus = Convert.ToInt32(filter.MemberStatus),
IgnoreMemberType = ignoreMemberType,
FilterMemberType = Convert.ToInt32(filter.MemberType),
IgnoreRank = ignoreRank,
FilterRank = Convert.ToInt32(filter.Rank),
OffSet = (page - 1) * 50
});
Method # 2
string queryThatWorks =
"SELECT *
FROM dbo.Customer c
WHERE c.MainCustomerId = #CustomerId
AND ({1} = 1 OR c.CustomerID = {2})
AND ({3} = 1 OR c.FirstName = '{4}')
AND ({5}= 1 OR c.LastName = '{6}')
AND ({7} = 1 OR c.CustomerStatusID = {8})
AND ({9} = 1 OR c.CustomerTypeID = {10})
AND ({11} = 1 OR c.RankID = {12})
ORDER BY c.CustomerId
OFFSET {13} ROWS
FETCH NEXT 50 ROWS ONLY";
_procExecutor.ExecuteSqlAsync<Report>(string.Format(queryThatWorks,
customerId,
ignoreCustomerId,
filter.CustomerID,
ignoreFirstName,
filter.FirstName,
ignoreLastName,
filter.LastName,
ignoreMemberStatus,
Convert.ToInt32(filter.MemberStatus),
ignoreMemberType,
Convert.ToInt32(filter.MemberType),
ignoreRank,
Convert.ToInt32(filter.Rank),
(page - 1) * 50
), null);
I've seen this countless times before.
I'm willing to bet that your columns are varChar, but Dapper is sending in your parameters as nVarChar. When that happens, SQL Server has to run a conversion on the value stored in each and every row. Besides being really slow, this prevents you from using indexes.
See "Ansi Strings and varchar" in https://github.com/StackExchange/dapper-dot-net
I have the following code to perform a full-text search. It creates a query, gets the total number of rows returned by that query and then retrieves the actual rows for only the current page.
// Create IQueryable
var query = from a in ArticleServerContext.Set<Article>()
where a.Approved
orderby a.UtcDate descending
select a;
// Get total rows (needed for pagination logic)
int totalRows = query.Count()
// Get rows for current page
query = query.Skip((CurrentPage - 1) * RowsPerPage).Take(RowsPerPage);
This works fine, but it requires two round trips to the database. In the interest of optimizing the code, is there any way to rework this query so it only had one round trip to the database?
Yes, you can perform this two operations with the help of the only one query to database:
// Create IQueryable
var query = from a in ArticleServerContext.Set<Article>()
where a.Approved
orderby a.UtcDate descending
select new { a, Total = ArticleServerContext.Set<Article>().Where(x => x.Approved).Count() };
//Get raw rows for current page with Total(Count) field
var result = query.Skip((CurrentPage - 1) * RowsPerPage).Take(RowsPerPage).ToList();
//this data you actually will use with your logic
var actualData = result.Select(x => x.a).ToList();
// Get total rows (needed for pagination logic)
int totalRows = result.First().Total;
If you use MSSQL query wil be look that way:
SELECT
[Extent1].[ID] AS [ID],
[Extent1].[UtcDate] AS [UtcDate],
[Extent1].[Approved] AS [Approved],
[GroupBy1].[A1] AS [C1]
FROM [dbo].[Articles] AS [Extent1]
CROSS JOIN (SELECT
COUNT(1) AS [A1]
FROM [dbo].[Articles] AS [Extent2]
WHERE [Extent2].[Approved] ) AS [GroupBy1]
WHERE [Extent1].[Approved]
ORDER BY [Extent1].[UtcDate] DESC
I'm not sure whether it's worth enough, but it's doable under the following constraints:
(1) CurrentPage and RowsPerPage are not affected by the totalRows value.
(2) The query is materialized after applying the paging parameters.
The trick is to use group by constant value, which is supported by EF. The code looks like this:
var query =
from a in ArticleServerContext.Set<Article>()
where a.Approved
// NOTE: order by goes below
group a by 1 into allRows
select new
{
TotalRows = allRows.Count(),
PageRows = allRows
.OrderByDescending(a => a.UtcDate)
.Skip((CurrentPage - 1) * RowsPerPage).Take(RowsPerPage)
};
var result = query.FirstOrDefault();
var totalRows = result != null ? result.TotalRows : 0;
var pageRows = result != null ? result.PageRows : Enumerable.Empty<Article>();
I'm trying to rewrite a SQL query in LINQ to Entities. I'm using LINQPad with a typed datacontext from my own assembly to test things out.
The SQL query I'm trying to rewrite:
SELECT DISTINCT variantID AS setID, option_value AS name, option_value_description AS description, sort_order as sortOrder
FROM all_products_option_names AS lst
WHERE lst.optionID=14 AND lst.productID IN (SELECT productID FROM all_products_option_names
WHERE optionID=7 AND option_value IN (SELECT name FROM brands
WHERE brandID=1))
ORDER BY sortOrder;
The LINQ to Entities query I've come up with so far (which doesn't work due to a timeout error):
from a in all_products_option_names
where a.optionID == 14 && all_products_option_names.Any(x => x.productID == a.productID && x.optionID == 7 && brands.Any(y => y.name == x.option_value && y.brandID == 1))
select new
{
id = a.variantID,
name = a.option_value,
description = a.option_value_description,
sortOrder = a.sort_order,
}
This is the error I get when I run the above query: An error occurred while executing the command definition. See the inner exception for details.
And the inner exception is: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
Edit:
I use MySQL and probably that's why LINQPad doesn't show me the generated SQL.
The SQL version doesn't time out.
Edit 2:
I solved the problem by completely changing the query, so this question is irrelevant now.
I marked Steven's response as the correct one, because he was closest to what i was trying to achieve and his response gave me the idea which led me to the solution.
Try this:
var brandNames =
from brand in db.Brands
where brand.ID == 1
select name;
var brandProductNames =
from p in db.all_products_option_names
where p.optionID == 7
where brandNames.Contains(p.option_value)
select p.productId;
var results =
from p in db.all_products_option_names
where p.optionID == 14
where brandProductNames.Contains(p.productId)
select new
{
setID = p.variantID,
name = p.option_value,
description = p.option_value_description,
sortOrder = p.sort_order
};
I would recommend doing joins rather than sub-select's as you have them. Sub-selects are not very efficient when you look at performance, it's like having loops inside of loops when you code , not a good idea. This could actually cause that timeout your getting if your database is running slowly even thou that looks like a simple query.
I would try using joins with a distinct at the end like this:
var results =
(from p in db.all_products_option_names
join p2 in db.all_products_option_names on p.productId equals p2.productId
join b in db.Brands on p2.option_value equals b.name
where p.optionID == 14
where p2.optionID == 7
where b.BrandID == 1
select new
{
setID = p.variantID,
name = p.option_value,
description = p.option_value_description,
sortOrder = p.sort_order
}).Distinct();
Or you could try using joins with the into and with an any like so
var results =
from p in db.all_products_option_names
join p2 in (from p3 in db.all_products_option_names.Where(x => x.optionId == 7)
join b in db.Brands.Where(x => x.BrandID == 1) on p3.option_value equals b.name
select p3) into pg
where p.optionID == 14
where pg.Any()
select new
{
setID = p.variantID,
name = p.option_value,
description = p.option_value_description,
sortOrder = p.sort_order
};
Some basics
I have two tables, one holding the users and one holding a log with logins.
The user table holds something like 15000+ users, the login table is growing and is reaching 150000+ posts.
The database is built upon SQL Server (not express).
To administer the users I got a gridview (ASPxGridView from Devexpress) that I populate from an ObjectDatasource.
Is there any general do’s and donts I should know about when summarizing the number of logins a user made.
Things are getting strangely slow.
Here is a picture showing the involved tables.
I’ve tried a few things.
DbDataContext db = new DbDataContext();
// Using foregin key relationship
foreach (var proUser in db.tblPROUsers)
{
var count = proUser.tblPROUserLogins.Count;
//...
}
Execution time: 01:29.316 (1 minute and 29 seconds)
// By storing a list in a local variable (I removed the FK relation)
var userLogins = db.tblPROUserLogins.ToList();
foreach (var proUser in db.tblPROUsers)
{
var count = userLogins.Where(x => x.UserId.Equals(proUser.UserId)).Count();
//...
}
Execution time: 01:18.410 (1 minute and 18 seconds)
// By storing a dictionary in a local variable (I removed the FK relation)
var userLogins = db.tblPROUserLogins.ToDictionary(x => x.UserLoginId, x => x.UserId);
foreach (var proUser in db.tblPROUsers)
{
var count = userLogins.Where(x => x.Value.Equals(proUser.UserId)).Count();
//...
}
Execution time: 01:15.821 (1 minute and 15 seconds)
The model giving the best performance is actually the dictionary. However I you know of any options I'd like to hear about it, also if there's something "bad" with this kind of coding when handling such large amounts of data.
Thanks
========================================================
UPDATED With a model according to BrokenGlass example
// By storing a dictionary in a local variable (I removed the FK relation)
foreach (var proUser in db.tblPROUsers)
{
var userId = proUser.UserId;
var count = db.tblPROUserLogins.Count(x => x.UserId.Equals(userId));
//...
}
Execution time: 02:01.135 (2 minutes and 1 second)
In addition to this I created a list storing a simple class
public class LoginCount
{
public int UserId { get; set; }
public int Count { get; set; }
}
And in the summarizing method
var loginCount = new List<LoginCount>();
// This foreach loop takes approx 30 secs
foreach (var login in db.tblPROUserLogins)
{
var userId = login.UserId;
// Check if available
var existing = loginCount.Where(x => x.UserId.Equals(userId)).FirstOrDefault();
if (existing != null)
existing.Count++;
else
loginCount.Add(new LoginCount{UserId = userId, Count = 1});
}
// Calling it
foreach (var proUser in tblProUser)
{
var user = proUser;
var userId = user.UserId;
// Count logins
var count = 0;
var loginCounter = loginCount.Where(x => x.UserId.Equals(userId)).FirstOrDefault();
if(loginCounter != null)
count = loginCounter.Count;
//...
}
Execution time: 00:36.841 (36 seconds)
Conclusion so far, summarizing with linq is slow, but Im getting there!
Perhaps it would be useful if you tried to construct an SQL query that does the same thing and executing it independently of your application (in SQL Server Management Studio). Something like:
SELECT UserId, COUNT(UserLoginId)
FROM tblPROUserLogin
GROUP BY UserId
(NOTE: This just selects UserId. If you want other fields from tblPROUser, you'll need a simple JOIN "on top" of this basic query.)
Ensure there is a composite index on {UserId, UserLoginId} and it is being used by the query plan. Having both fields in the index and in that order ensures your query can run without touching the tblPROUserLogin table:
Then benchmark and see if you can get a significantly better time than your LINQ code:
If yes, then you'll need to find a way to "coax" the LINQ to generate a similar query.
If no, then you are already as fast as you'll ever be.
--- EDIT ---
The follwing LINQ snippet is equivalent to the query above:
var db = new UserLoginDataContext();
db.Log = Console.Out;
var result =
from user_login in db.tblPROUserLogins
group user_login by user_login.UserId into g
select new { UserId = g.Key, Count = g.Count() };
foreach (var row in result) {
int user_id = row.UserId;
int count = row.Count;
// ...
}
Which prints the following text in the console:
SELECT COUNT(*) AS [Count], [t0].[UserId]
FROM [dbo].[tblPROUserLogin] AS [t0]
GROUP BY [t0].[UserId]
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 4.0.30319.1
--- EDIT 2 ---
To have the "whole" user and not just UserId, you can do this:
var db = new UserLoginDataContext();
db.Log = Console.Out;
var login_counts =
from user_login in db.tblPROUserLogins
group user_login by user_login.UserId into g
select new { UserId = g.Key, Count = g.Count() };
var result =
from user in db.tblPROUsers
join login_count in login_counts on user.UserId equals login_count.UserId
select new { User = user, Count = login_count.Count };
foreach (var row in result) {
tblPROUser user = row.User;
int count = row.Count;
// ...
}
And the console output shows the following query...
SELECT [t0].[UserId], [t0].[UserGuid], [t0].[CompanyId], [t0].[UserName], [t0].[UserPassword], [t2].[value] AS [Count]
FROM [dbo].[tblPROUser] AS [t0]
INNER JOIN (
SELECT COUNT(*) AS [value], [t1].[UserId]
FROM [dbo].[tblPROUserLogin] AS [t1]
GROUP BY [t1].[UserId]
) AS [t2] ON [t0].[UserId] = [t2].[UserId]
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 4.0.30319.1
...which should be very efficient provided your indexes are correct:
The second case should always be the fastest by far provided you drop the ToList() so counting can be done on the database side, not in memory:
var userId = proUser.UserId;
var count = db.tblPROUserLogins.Count(x => x.UserId == userId);
Also you have to put the user id into a "plain" primitive variable first since EF can't deal with mapping properties of an object.
Sorry, doing this blind since I'm not on my normal computer. Just a couple of questions
do you have an index on the user id in the logins table
have you tried a view specifically crafted for this page?
are you using paging to get the users, or trying to get all counts at once?
have you run sql profiler and watched the actual sql being sent?
Does something like this work for you?
var allOfIt = from c in db.tblProUsers
select new {
User = c,
Count = db.tblProUserLogins.Count(l => l.UserId == c.UserId)
}
.Skip(pageSize * pageNumber)
.Take(pageSize) // page size