SQL Execution Time Slowing Down, Code First - c#

I've got a bit of a strange one here. I'm using Entity Framework Code First in a console app that runs a batch process. The code loops round a series of dates executing a stored procedure every time.
Currently it loops about 300 times and over time, each execution gets slower and slower till near the end when its crawling.
I've tried memory profiling and that's not it. Here's example code.
_dbContext = new FooContext();
_barService = new BarService(new GenericRepository<Bar>(), _dbContext);
for (var date = lastCalculatedDate.AddDays(1); date <= yesterday; date = date.AddDays(1))
{
_barService.CalculateWeightings(date);
}
And all CalculateWeightings does is (I'm using nlog as well)
public void CalculateWeightings(DateTime dateTime)
{
_logger.Info("Calculating weightings for {1}", dateTime);
Context.Database.ExecuteSqlCommand("EXEC CalculateWeightings #dateTime", new SqlParameter("#dateTime", dateTime);
}
The stored procedure just populates a table with some records. Nothing complicated, the table ends up with a couple of 1000 rows in it so the problem isn't there
Any thoughts?
For those of you wanting to see the sql. Its a bit of a behemoth but I can't see any reason this would slow down over time. The number of rows dealt with are pretty low.
CREATE PROCEDURE [dbo].[CalculateWeightings]
#StartDate DateTime,
#EndDate DateTime,
#TradedMonthStart DateTime,
#InstrumentGroupId int
AS
BEGIN
---- GET ALL THE END OF DAY PRICINGS FOR MONTHLYS ----
SELECT
ROW_NUMBER() OVER
(
PARTITION BY RawTrades.FirstSequenceItemName,
CONVERT(VARCHAR, RawTrades.LastUpdate, 103)
ORDER BY RawTrades.FirstSequenceItemName, RawTrades.LastUpdate DESC
) AS [Row],
RawTrades.FirstSequenceItemID AS MonthId,
Sequences.ActualStartMonth,
Sequences.ActualEndMonth,
RawTrades.FirstSequenceItemName AS [MonthName],
CONVERT(VARCHAR, RawTrades.LastUpdate, 103) AS LastUpdate,
RawTrades.Price
INTO #monthly
FROM RawTrades
INNER JOIN Sequences ON RawTrades.FirstSequenceItemId = Sequences.SequenceItemId AND RawTrades.FirstSequenceId = Sequences.SequenceId
WHERE RawTrades.FirstSequenceID IN (SELECT MonthlySequenceId FROM Instruments WHERE InstrumentGroupId = #InstrumentGroupId)
AND [Action] <> 'Remove'
AND LastUpdate >= #StartDate
AND LastUpdate < #EndDate
AND ActualStartMonth >= #TradedMonthStart
ORDER BY RawTrades.FirstSequenceItemID, RawTrades.LastUpdate DESC
---- GET ALL THE END OF DAY PRICINGS FOR QUARTERLYS ----
SELECT
ROW_NUMBER() OVER
(
PARTITION BY RawTrades.FirstSequenceItemName,
CONVERT(VARCHAR, RawTrades.LastUpdate, 103)
ORDER BY RawTrades.FirstSequenceItemName, RawTrades.LastUpdate DESC
) AS [Row],
CONVERT(VARCHAR, RawTrades.LastUpdate, 103) AS LastUpdate,
Sequences.ActualStartMonth,
Sequences.ActualEndMonth,
RawTrades.Price
INTO #quarterly
FROM RawTrades
INNER JOIN Sequences ON RawTrades.FirstSequenceItemId = Sequences.SequenceItemId AND RawTrades.FirstSequenceId = Sequences.SequenceId
WHERE RawTrades.FirstSequenceID IN (SELECT QuarterlySequenceId FROM Instruments WHERE InstrumentGroupId = #InstrumentGroupId)
AND Action <> 'Remove'
AND LastUpdate >= #StartDate
AND LastUpdate < #EndDate
AND RawTrades.Price > 20
ORDER BY RawTrades.FirstSequenceItemID, RawTrades.LastUpdate DESC
---- GET ALL THE END OF DAY PRICINGS FOR QUARTERLYS ----
SELECT
ROW_NUMBER() OVER
(
PARTITION BY RawTrades.FirstSequenceItemName,
CONVERT(VARCHAR, RawTrades.LastUpdate, 103)
ORDER BY RawTrades.FirstSequenceItemName, RawTrades.LastUpdate DESC
) AS [Row],
CONVERT(VARCHAR, RawTrades.LastUpdate, 103) AS LastUpdate,
Sequences.ActualStartMonth,
Sequences.ActualEndMonth,
RawTrades.Price
INTO #seasonal
FROM RawTrades
INNER JOIN Sequences ON RawTrades.FirstSequenceItemId = Sequences.SequenceItemId AND RawTrades.FirstSequenceId = Sequences.SequenceId
WHERE RawTrades.FirstSequenceID IN (SELECT SeasonalSequenceId FROM Instruments WHERE InstrumentGroupId = #InstrumentGroupId)
AND Action <> 'Remove'
AND LastUpdate >= #StartDate
AND LastUpdate < #EndDate
AND RawTrades.Price > 20
ORDER BY RawTrades.FirstSequenceItemID, RawTrades.LastUpdate DESC
---- BEFORE WE INSERT RECORDS MAKE SURE WE DON'T ADD DUPLICATES ----
DELETE FROM LiveCurveWeightings
WHERE InstrumentGroupId = #InstrumentGroupId
AND CalculationDate = #EndDate
---- CALCULATE AND INSERT THE WEIGHTINGS ----
INSERT INTO LiveCurveWeightings (InstrumentGroupId, CalculationDate, TradedMonth, QuarterlyWeighting, SeasonalWeighting)
SELECT
#InstrumentGroupId,
#EndDate,
#monthly.ActualStartMonth,
AVG(COALESCE(#monthly.Price / #quarterly.Price,1)) AS QuarterlyWeighting,
AVG(COALESCE(#monthly.Price / #seasonal.Price,1)) AS SeasonalWeighting
FROM #monthly
LEFT JOIN #quarterly
ON #monthly.ActualStartMonth >= #quarterly.ActualStartMonth
AND #monthly.ActualEndMonth <= #quarterly.ActualEndMonth
AND #quarterly.[Row] = 1
AND #monthly.LastUpdate = #quarterly.LastUpdate
LEFT JOIN #seasonal
ON #monthly.ActualStartMonth >= #seasonal.ActualStartMonth
AND #monthly.ActualEndMonth <= #seasonal.ActualEndMonth
AND #seasonal.[Row] = 1
AND #monthly.LastUpdate = #seasonal.LastUpdate
WHERE #monthly.[Row] = 1
GROUP BY #monthly.ActualStartMonth
DROP TABLE #monthly
DROP TABLE #quarterly
DROP TABLE #seasonal
END

I think this issue may be due to your EF tracking graph getting too large. If you re-use your context in a batch operation with the tracking graph on every time you perform an operation it needs to enumerate the graph. With a few hundread items this isnt an issue but when you get into the 000s it can become a massive problem. Take a look at my article on this here and see if you think it matches the issue.
If you take a look at the graph below for insert operations you can see around 1000 inserts (when tracking is on) starts to sharply spike in execution time. (also note the log scales on the axis)

Related

SQL Server looping through inside stored procedure

I have a C# windows application which is related to products. In my application we fetch the date when the product is added to the inventory. Based on the AddDate of the product, we get the age of the product in months.
Let's say the age of a product is 25 months.
int Age = 25;
for(int i = Age; i >=0; i --)
{
var result = GetProductData(DateTime.Now.AddMonth(0-i));
}
The GetProductData() method calls a stored procedure, so if the age of the product is 25 months, the stored procedure gets called 25 times.
In the stored procedure, we extract the month and year part from the DateTime and store those bits into 2 separate variables. This is how it is currently
CREATE PROCEDURE usp_GetProductData
#AppId INT,
#Date DATETIME
AS
BEGIN
DECLARE #Month INT
DECLARE #Year INT
DECLARE #ProductInstall INT
SELECT #Month = SELECT DATEPART(m, #Date)
SELECT #Year = SELECT YEAR(#Date)
SELECT #ProductInstall = (SUM(P.[AutoInstalls]) + SUM(P.[ITInstalls]))
FROM dbo.[ProductInstalls] P
INNER JOIN [User] U ON U.[UserId] = P.[UserId]
WHERE LicenseRequired = 1
AND DATEPART(m, P.[InstallDate]) = #month
AND DATEPART(year, P.[InstallDate]) = #Year
SELECT
AVG(A.[TotalRequests] - A.[TotalInstalls]) * 100 AS [ProductAverage],
#ProductInstall, #Month/#Year
FROM
dbo.[ApplicationInstalls]
/*There are few more joins and some business logic after this */
WHERE
DATEPART(m, A.[InstallDate]) = #Month
AND DATEPART(year, A.[InstallDate]) = #Year
END
Now instead of calling the stored procedure as many times as the age of product/application, I want to do it in a single request as I already have the date the application/product was added to the inventory
DECLARE #ProductAddDate
DECLARE #ProductAge
SELECT #ProductAddDate = [DateAdded] FROM dbo.[Application] WHERE [AppId] = #AppId
SELECT #ProductAge = DATEDIFF(DAY, #ProductAddDate, GETDATE())/30
Now with the product age I have, I want to loop through the below logic for every month.
SELECT
#ProductInstall = (SUM(P.[AutoInstalls]) + SUM(P.[ITInstalls]))
FROM
dbo.[ProductInstalls] P
INNER JOIN
[User] U ON U.[UserId] = P.[UserId]
WHERE
LicenseRequired = 1
AND DATEPART(m, P.[InstallDate]) = #month
AND DATEPART(year, P.[InstallDate]) = #Year
SELECT
AVG(A.[TotalRequests] - A.[TotalInstalls]) * 100 AS [ProductAverage], #ProductInstall,
#Month/#Year
FROM
dbo.[ApplicationInstalls]
/*There are few more joins and some business logic after this */
WHERE
DATEPART(m, A.[InstallDate]) = #Month
AND DATEPART(year, A.[InstallDate]) = #Year
Not sure if your core logic really represents 'application' data, or product data, but I'll keep the proc name as 'getProductData'. Here's what it does:
First, in the 'monthYears' CTE, I get the product-add-date for the application and create month and year bins that spread from the add date to the present date. This is the only 'looping' I do. Otherwise, the healthier approach is to use set-theoretical operations available in SQL Server, which are much more efficient.
Then, in the 'productInstalls' CTE, I take your logic, omit the month and year filters, and instead group by month and year to get them all at once.
I do the same for your application installs logic in the 'applicationInstalls' CTE.
Finally, I join it all together by year and month in the core query.
Here's the code:
create procedure getProductData
#AppId int
as
declare #ProductAddDate date = (
select dateAdded
from [application]
where appId = #AppId
);
with
monthYears as (
select mo = datepart(m, #productAddDate),
yr = datepart(year, #productAddDate),
i = 0
union all
select mo = datepart(m, dateAdd(m, i+1, #productAddDate)),
yr = datepart(year, dateAdd(m, i+1, #productAddDate)),
i = i+1
from monthYears
where datepart(m, dateAdd(m, i+1, #productAddDate)) <= datepart(m, getdate())
and datepart(year, dateAdd(m, i+1, #productAddDate)) <= datepart(year, getdate())
),
productInstalls as (
select mo = datepart(m, p.installDate),
yr = datepart(year, p.installDate),
ProductInstall = sum(p.autoinstalls) + sum(p.itinstalls)
from productInstalls p
join [user] u on u.userId = p.userId
where licenseRequired = 1
group by datepart(m, p.installDate),
datepart(year, p.installDate)
),
applicationInstalls as (
select mo = datepart(m, a.installDate),
yr = datepart(year, a.installDate),
ProductAverage = avg(a.totalRequests - a.totalInstalls) * 100
from applicationInstalls a
group by datepart(m, p.installDate),
datepart(year, p.InstallDate)
)
select my.yr,
my.mo,
ProductAverage = isnull(a.ProductAverage, 0),
ProductInstall = isnull(p.ProductInstall, 0)
from monthYears my
left join applicationInstalls a on my.yr = a.yr and my.mo = a.mo
left join productInstalls p on my.yr = p.yr and my.mo = p.mo;
When you get this in C#, it will be as a collection. Most likely it will be output as a DataTable or your business layer might output it as an IEnumerable or List of some type. So your looping, if necessary, will then occur in the C# code, not the SQL code.
Something like:
int appId = 0; // or whatever
foreach(var productDatum in GetProductData(appId)) {
// do something with productDatum
}
I don't have an "Answer" per se, but I do have an idea that might be worth looking deeper into. In SQL Server, there is the concept of Window Functions. Itsak Ben-Gan wrote a book on it a few years back, but you can find it by Googling - here is one site.
The con here is you have to start thinking differently to program this way. The pro is, where applicable, no recursion. The window flows down through the table on a single scan gathering the aggregates as it goes down. A few years back, I used this to determine scheduling conflicts in a large scale app, reducing time to return results from 10 - 20 seconds to about .5 seconds. You may not require this performance bump, but if you can figure out how to apply to your problem, you avoid a lot of table (or index) scans.

How to select +1 record from MSSQL using EF with single query?

In short:
I have records that have CreationTime column in database. I want to select records from last 2 days PLUS one record that follows (sort by creation date desc) that can be any time old.
So from records (knowing that today date is 11th March) I want to select all records that are at max 2 days old + 1:
1. 2019-03-11
2. 2019-03-11
3. 2019-03-10
4. 2019-03-08
5. 2019-03-07
6. 2019-03-16
So result should contain records 1,2,3,4. (4. even though it is 3 days old, it is that "+1" record I need).
I'm using MSSQL and .NET 4.6.1 Entity Framework.
IMO cleaner way to achieve this is to write two queries: first to get data from last two days and second is to get the latest record older than 2 days.
To get records from last 2 days:
select * from MyTable where CreationTime between getdate() and getdate() - 2
To get additional record:
select top 1 * from MyTable where CreationTme < getdate() - 2 order by CreationTime desc
Using EF with LINQ methods (dc is database context):
To get records from last 2 days:
dc.Entitites.Where(e => e.CreationTime <= DateTime.Now && e.CreationTime >= DateTime.Now.AddDays(-2));
additional record:
dc.Entities.Where(e => e.CreationTime < DateTime.Now.AddDays(-2)).OrderByDescending(e => e.CreationTime).First();
Try the following Logic
DECLARE #T TABLE
(
SeqNo INT IDENTITY(1,1),
MyDate DATETIME
)
INSERT INTO #T
VALUES(GETDATE())
,(DATEADD(MINUTE,-23,GETDATE()))
,(DATEADD(MINUTE,-78,GETDATE()))
,(DATEADD(MINUTE,-5443,GETDATE()))
,(DATEADD(MINUTE,-34,GETDATE()))
,(DATEADD(MINUTE,-360,GETDATE()))
,(DATEADD(MINUTE,-900,GETDATE()))
,(DATEADD(MINUTE,-1240,GETDATE()))
,(DATEADD(MINUTE,-3600,GETDATE()))
;WITH CTE
AS
(
SELECT
RN = ROW_NUMBER() OVER(PARTITION BY CAST(MyDate AS DATE) ORDER BY MyDate DESC),
DateSeq = DATEDIFF(DAY,MyDate,GETDATE()),
*
FROM #T
)
SELECT
*
FROM CTE
WHERE
DateSeq <2
OR
(
DateSeq = 2
AND
RN = 1
)
You can try the following query.
DECLARE #table TABLE(StartDate DATETIME)
INSERT INTO #table
VALUES('2019-03-11'),('2019-03-11'),('2019-03-10'),
('2019-03-08'),('2019-03-07'),('2019-03-16')
SELECT * FROM #table WHERE StartDate BETWEEN GETDATE()-4 AND GETDATE()
For getting old 4th entry,
SELECT * FROM #table
ORDER BY (select null)
OFFSET (select Count(*) from #table where StartDate BETWEEN GETDATE()-2 AND
GETDATE()) ROWS
FETCH NEXT 1 ROWS ONLY

SQL Server - DATEDIFF Function taking too long

I've done some extensive research and I've concluded that the DATEDIFF function is making my queries run very slow.
Below is the generated query by Entity Framework and it does look readable enough hopefully.
Here's the Linq that generates the T-SQL:
model.NewTotal1Week = ( from sdo in context.SubscriberDebitOrders
where
(
sdo.CampaignId == campaignId &&
( sdo.Status == ( Int32 ) DebitOrderStatus.New_Faulty ) &&
( SqlFunctions.DateDiff( "week", sdo.Collections.FirstOrDefault( c => c.TxnStatus == "U" ).ProcessDate, DateTime.Now ) <= 1 )
)
select sdo ).Count();
In the query below, I would like to get a COUNT of all Collections which fall within 1 week from the time they were Processed to today's date.
Is there anyone that can help me get rid of the DATEDIFF function? I've seen examples online but I couldn't adapt it to my scenario, forgive me I'm not very genius yet.
exec sp_executesql N'SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[SubscriberDebitOrder] AS [Extent1]
OUTER APPLY (SELECT TOP (1)
[Extent2].[ProcessDate] AS [ProcessDate]
FROM [dbo].[Collections] AS [Extent2]
WHERE ([Extent1].[Id] = [Extent2].[DebitOrderId]) AND (''U'' = [Extent2].[TxnStatus]) ) AS [Limit1]
WHERE ([Extent1].[CampaignId] = #p__linq__0) AND (3 = [Extent1].[Status]) AND ((DATEDIFF(week, [Limit1].[ProcessDate], SysDateTime())) <= 1)
) AS [GroupBy1]',N'#p__linq__0 int',#p__linq__0=3
go
Thanks in advance.
Its not the just DATEDIFF, any function on the column would cause query to do a SCAN on the underlying table/index
DATEDIFF(week, [Limit1].[ProcessDate], SysDateTime())) <=1
Above logic is fetching last week data? You can also write above without putting function around ProcessDate Column.
[Limit1].[ProcessDate] > SysDateTime()-7
This is your query:
SELECT GroupBy1.A1 AS C1
FROM (SELECT COUNT(1) AS[A1
FROM dbo.SubscriberDebitOrder AS Extent1 OUTER APPLY
(SELECT TOP (1) Extent2.ProcessDate
FROM [dbo].Collections Extent2
WHERE (Extent1.Id = Extent2.DebitOrderId AND
'U' = Extent2.TxnStatus
) AS [Limit1]
WHERE (Extent1.CampaignId = #p__linq__0) AND (3 = Extent1.Status) AND
(DATEDIFF(week, Limit1.ProcessDate, SysDateTime()) <= 1)
) GroupBy1;
As mentioned elsewhere, you should change the date logic and get rid of the outer query:
SELECT COUNT(1) AS A1
FROM dbo.SubscriberDebitOrder AS Extent1 OUTER APPLY
(SELECT TOP (1) Extent2.ProcessDate
FROM [dbo].Collections Extent2
WHERE (Extent1.Id = Extent2.DebitOrderId AND
'U' = Extent2.TxnStatus
) AS limit1
WHERE (Extent1.CampaignId = #p__linq__0) AND (3 = Extent1.Status) AND
Limit1.ProcessDate <= DATEADD(-1, week, GETDATE())
Very important note: This is not exactly equivalent to your query. Your original query counted the number of week boundaries between two dates. This depends on datefirst, but it woudld often be the number of Saturday or Sunday nights.
Based on your description, the above is more correct.
Next, you want indexes on Collections(DebitOrderId, TxnStatus, ProcessDate) and SubscriberDebitOrder(CampaignId, Status).

Problem with SQL Query Tracking

Okay so here's my issue.
The user can go onto my site and retrieve 8 records at a time, then he/she is given the option to load more. These 8 records can be sorted by a param passed into the proc. Now when I get these 8 records on the front end, I have their ID's (hidden to the user though obviously), but their ID's are not in any specific order because the records are sorted by a variety of possible things.
When they click "Load More", I should be able to get the next 8 records from the database, sorted in the SAME fashion as the first 8 were.
For example, "Give me the top 8 records sorted by age". -> Click Load More -> Give me the next 8 oldest records without showing me the onces I just saw.
How can I call the proc and make sure none from the first result set are returned though? I only want to return 8 records at a time for efficiency reasons.
SELECT TOP 8
m.message,
m.votes,
(geography::Point(#latitude, #longitude, 4326).STDistance(m.point)) * 0.000621371192237334 as distance,
m.location,
datediff(hour,m.timestamp, getdate()) as age,
m.messageId,
ml.voted,
ml.flagged
FROM
tblMessages m
left join tblIPMessageLink ml on m.messageid = ml.messageid
WHERE
m.timestamp >= DATEADD(day, DATEDIFF(day, 0, #date), 0)
and
m.timestamp < DATEADD(day, DATEDIFF(day, 0, #date), 1)
ORDER BY
CASE WHEN #sort = 'votes1' THEN m.votes END DESC,
CASE WHEN #sort = 'votes2' THEN m.votes END ASC,
CASE WHEN #sort = 'age1' THEN datediff(hour,m.timestamp, getdate()) END ASC,
CASE WHEN #sort = 'age2' THEN datediff(hour,m.timestamp, getdate()) END DESC,
CASE WHEN #sort = 'distance1' THEN (geography::Point(#latitude, #longitude, 4326).STDistance(m.point)) * 0.000621371192237334 END ASC,
CASE WHEN #sort = 'distance2' THEN (geography::Point(#latitude, #longitude, 4326).STDistance(m.point)) * 0.000621371192237334 END DESC
END
That's my current query. How would I change it to work with paging?
use row_number
example
call 1
;WITH cte AS(SELECT *,row_number() OVER( ORDER BY name) AS rows FROM sysobjects)
SELECT * FROM cte WHERE ROWS BETWEEN 1 AND 8
ORDER BY rows
call 2
;WITH cte AS(SELECT *,row_number() OVER( ORDER BY name) AS rows FROM sysobjects)
SELECT * FROM cte WHERE ROWS BETWEEN 9 AND 16
ORDER BY rows
of course you want to use parameters instead of hardcoding the numbers, this way you can reuse the query, if the column can be sorted arbitrarily then you might need to use dynamic SQL
edit, here is what it should look like, you probably also want to return the max rownumber so that you know how many rows can be potentially returned
also you can make rows per page dynamic, in that case it would be something like
where Rows between #StartRow and (#StartRow + #RowsPerPage) -1
make sure to read Dynamic Search Conditions in T-SQL Version for SQL 2008 to see how you can optimize this to get plan reuse and a better plan in general
anyway, here is the proc, untested of course since I can't run it here
DECLARE #StartRow INT,#EndRow INT
--SELECT #StartRow =1, #EndRow = 8
;WITH cte AS (SELECT ROW_NUMBER() OVER (ORDER BY
CASE WHEN #sort = 'votes1' THEN m.votes END DESC,
CASE WHEN #sort = 'votes2' THEN m.votes END ASC,
CASE WHEN #sort = 'age1' THEN datediff(hour,m.timestamp, getdate()) END ASC,
CASE WHEN #sort = 'age2' THEN datediff(hour,m.timestamp, getdate()) END DESC,
CASE WHEN #sort = 'distance1' THEN (geography::Point(#latitude, #longitude, 4326).STDistance(m.point)) * 0.000621371192237334 END ASC,
CASE WHEN #sort = 'distance2' THEN (geography::Point(#latitude, #longitude, 4326).STDistance(m.point)) * 0.000621371192237334 END DESC
END) AS rows
m.message,
m.votes,
(geography::Point(#latitude, #longitude, 4326).STDistance(m.point)) * 0.000621371192237334 as distance,
m.location,
datediff(hour,m.timestamp, getdate()) as age,
m.messageId,
ml.voted,
ml.flagged
FROM
tblMessages m
left join tblIPMessageLink ml on m.messageid = ml.messageid
WHERE
m.timestamp >= DATEADD(day, DATEDIFF(day, 0, #date), 0)
and
m.timestamp < DATEADD(day, DATEDIFF(day, 0, #date), 1)
)
SELECT *
FROM cte WHERE ROWS BETWEEN #StartRow AND #EndRow
ORDER BY rows
David Hayden has a nice article on paging. You'll just need to keep track of the number of records and offset.
Also you'll still need to merge and resort the records on the client every time they load more
Here's the SP from that article
CREATE PROCEDURE dbo.ShowLog
#PageIndex INT,
#PageSize INT
AS
BEGIN
WITH LogEntries AS (
SELECT ROW_NUMBER() OVER (ORDER BY Date DESC)
AS Row, Date, Description
FROM LOG)
SELECT Date, Description
FROM LogEntries
WHERE Row between
(#PageIndex - 1) * #PageSize + 1 and #PageIndex*#PageSize
END

Execute multiple aggregates at once (SQL, LINQ).. or with better performance?

I have a table with records which include a datetime column "CreationDate".
I need to get the following information for every of the last 90 days:
How many records were there in total in existence
How many records were added on that day
I could do this through a loop of counting of course, but this would hit the database 90 times... is there a better way of doing this aggregate without having to riddle the DB with requests?
I'm using C#, LINQ, SQL Server 2008.
Are you looking for something like this?
WITH CTE AS
(SELECT COUNT(*) OVER () AS TotalCount,
CAST(CONVERT(VARCHAR, CreationDate, 101) as DATETIME) as DateValue, *
FROM MyTable
WHERE CreationDate >= DATEADD(DD, -90, GETDATE())
)
SELECT DateValue, TotalCount, COUNT(*) as RowCount
FROM CTE
group by DateValue, TotalCount
order by DateValue
;
Pull the records (or just the ids and creation dates, if that is all you need), and then perform the logic in code. One SELECT against the DB.
edit
In response to comment:
You can get the number of items for each day with a query like this:
SELECT CreationDate, COUNT(CreationDate) FROM MyTable GROUP BY CreationDate
Note that this assumes no times in CreationDate. If you have different times, the grouping won't work -- you'll have to flatten those out.
You can also add a WHERE clause to only look at the items from the last 90 days.
Bringing back the daily totals for the 90 days then aggregating in your application would probably be the best idea. There is currently no particularly satisfactory way of calculating running totals in SQL Server. An example of how you could do it though is below (using sys.objects as the demo table)
IF OBJECT_ID('tempdb..#totals') IS NOT NULL
DROP TABLE #totals
DECLARE #EndDate DATE = CURRENT_TIMESTAMP;
DECLARE #StartDate DATE = DATEADD(DAY,-89,#EndDate);
WITH DateRange AS
(
SELECT
#StartDate [DATE]
UNION ALL
SELECT
DATEADD(DAY, 1, DATE) [DATE]
FROM
DateRange
WHERE
DATE < #EndDate
)
SELECT DATE,COUNT(t.modify_date) AS DailyTotal
INTO #totals
FROM DateRange LEFT JOIN sys.objects t
ON modify_date BETWEEN #StartDate AND #EndDate
AND CAST(t.modify_date AS DATE) = DateRange.Date
GROUP BY DATE
ORDER BY DATE
DECLARE #BaseNumber INT = (SELECT COUNT(*) FROM sys.objects WHERE
modify_date < #StartDate);
SELECT t1.Date,
t1.DailyTotal,
#BaseNumber + SUM(t2.DailyTotal) AS RunningTotal
FROM #totals t1
JOIN #totals t2 ON t2.date <= t1.date
/*Triangular join will yield 91x45 rows that are then grouped*/
GROUP BY t1.Date,t1.DailyTotal
ORDER BY t1.Date

Categories