Count Consecutive vacation days skip through holidays and weekends - c#

I have a table which has records of user's vacation days.
A Sample of that would be:
+---------+-----------+---------+------------+
| country | user_name | user_id | vac_date |
+---------+-----------+---------+------------+
| canada | James | 1111 | 2015-02-13 |
| canada | James | 1111 | 2015-02-17 |
| canada | James | 1111 | 2015-02-18 |
| canada | James | 1111 | 2015-02-10 |
| canada | James | 1111 | 2015-02-11 |
+---------+-----------+---------+------------+
With the above data, the count would be 3 from feb 13th to feb 18th, because 14th and 15th are weekends and the 16th is a holiday here in Canada. So essentially, I am trying to hold and continue the count if the user took the next working day off. I also have a table that has all the holidays which includes the country and the date of the holiday. Sample data for the holiday table would be:
+---------+-------------+-------------+
| country | holidayDesc | holidayDate |
+---------+-------------+-------------+
| canada | Family Day | 2015-02-16 |
+---------+-------------+-------------+
Currently i have a query in SQL that counts the the dates normally, so it only counts whatever is in the vacation table. For example: if a user took march 3rd 2015, march 4th 2015, and march 5th 2015 off, then it will have a count of 3, but for that above table example, it would only have a count of 1 for feb 13th and 2 from feb 17th to feb 18th.
SELECT DISTINCT user_name
,min(vac_date) as startDate
,max(vac_date) as endDate
,datediff(day, min(vac_date), max(vac_date)) as consecutiveCount
FROM (
SELECT user_name
,vac_date
,user_id
,groupDate = DATEADD(DAY, - ROW_NUMBER() OVER (
PARTITION BY user_id ORDER BY vac_date
), vac_date)
FROM mytable
WHERE country = 'canada'
AND vac_date BETWEEN '20150101'
AND '20151231'
) z
GROUP BY user_name
,groupDate
HAVING datediff(day, min(vac_date), max(vac_date)) >= 0
ORDER BY user_name
,min(vac_date);
This is what it currently outputs from the above sample data:
+-----------+------------+------------+------------------+
| user_name | startDate | endDate | consecutiveCount |
+-----------+------------+------------+------------------+
| James | 2015-02-10 | 2015-02-11 | 2 |
| James | 2015-02-13 | 2015-02-13 | 1 |
| James | 2015-02-17 | 2015-02-18 | 2 |
+-----------+------------+------------+------------------+
Ideally i would like it to be:
+-----------+------------+------------+------------------+
| user_name | startDate | endDate | consecutiveCount |
+-----------+------------+------------+------------------+
| James | 2015-02-10 | 2015-02-11 | 2 |
| James | 2015-02-13 | 2015-02-18 | 3 |
+-----------+------------+------------+------------------+
But i don't know if that is possible with pure SQL. I can also try to incorporate it into C#.
If it helps I am also using C# and SQL Server Management Studio. Any help would be appreciated. Thanks in advance

I try to go a different route, but then found the fix for John Cappelletti solution.
First you need to add weekend dates to your holiday table.
Get a list of dates between two dates using a function
Then UNION ALL vacation days with holidays, but add a description field so you can difference between both.
There are some CROSS JOIN so you can have holiday and weekends for each country and user (need testing)
SELECT [country],
[user_name], [user_id], [vac_date], 'vacation' as description
FROM vacations
UNION ALL
SELECT c.[country],
u.[user_name],
u.[user_id],
[holidayDate],
'holiday' as description
FROM holidays
CROSS JOIN (SELECT DISTINCT [country] FROM vacations) c
CROSS JOIN (SELECT DISTINCT [user_name], [user_id] FROM vacations) u
Then the final query is the same as John suggested, but this time you only count vacation days.
WITH joinDates as (
SELECT [country],
[user_name], [user_id], [vac_date], 'vacation' as description
FROM vacations
UNION ALL
SELECT c.[country],
u.[user_name],
u.[user_id],
[holidayDate],
'holiday' as description
FROM holidays
CROSS JOIN (SELECT DISTINCT [country] FROM vacations) c
CROSS JOIN (SELECT DISTINCT [user_name], [user_id] FROM vacations) u
)
Select user_name
,startDate = min(vac_date)
,endDate = max(vac_date)
,consecutiveCount = count(*)
From (
Select *
,Grp = Day(vac_date) - Row_Number() over (Partition By country,user_id
Order by vac_date)
From joinDates S
) A
WHERE description = 'vacation' -- only count vacation days ignore holiday/weekend
Group By user_name, Grp
Having count(*)>1
ORDER BY startDate
SQL DEMO
OUTPUT
RAW OUTPUT
here you can see the data before the group by

This seems like a classic Gaps & Islands with a little twist.
Declare #YourTable table (country varchar(25),user_name varchar(25),user_id varchar(25),vac_date date)
Insert Into #YourTable values
('canada','James','1111','2015-02-13'),
('canada','James','1111','2015-02-17'),
('canada','James','1111','2015-02-18'),
('canada','James','1111','2015-02-10'),
('canada','James','1111','2015-02-11')
Declare #Holiday table (country varchar(25),holidayDate date)
Insert Into #Holiday values
('canada','2015-02-16')
Select user_name
,startDate = min(vac_date)
,endDate = max(vac_date)
,consecutiveCount = sum(DayCnt)
From (
Select *
,Grp = Day(vac_date) - Row_Number() over (Partition By country,user_id Order by vac_date)
From (Select Country,user_name,user_id,vac_date,DayCnt=1 from #YourTable
Union All
Select A.Country,user_name,user_id,vac_date=b.holidayDate,DayCnt=1
From #YourTable A
Join #Holiday B on A.country=B.country and abs(DateDiff(DD,vac_date,holidayDate))=1
Union All
Select A.Country,user_name,user_id,vac_date=b.retval,DayCnt=0
From #YourTable A
Join (
Select * From [dbo].[udf-Range-Date]('2015-01-01','2017-12-31','DD',1) where DateName(WEEKDAY,RetVal) in ('Saturday','Sunday')
) B on abs(DateDiff(DD,vac_date,RetVal))=1
) S
) A
Group By user_name,Grp
Having Sum(DayCnt)>1
Returns
user_name startDate endDate consecutiveCount
James 2015-02-10 2015-02-11 2
James 2015-02-16 2015-02-18 3
The UDF to generate dynamic Date Ranges -- could be your own query
CREATE FUNCTION [dbo].[udf-Range-Date] (#R1 datetime,#R2 datetime,#Part varchar(10),#Incr int)
Returns Table
Return (
with cte0(M) As (Select 1+Case #Part When 'YY' then DateDiff(YY,#R1,#R2)/#Incr When 'QQ' then DateDiff(QQ,#R1,#R2)/#Incr When 'MM' then DateDiff(MM,#R1,#R2)/#Incr When 'WK' then DateDiff(WK,#R1,#R2)/#Incr When 'DD' then DateDiff(DD,#R1,#R2)/#Incr When 'HH' then DateDiff(HH,#R1,#R2)/#Incr When 'MI' then DateDiff(MI,#R1,#R2)/#Incr When 'SS' then DateDiff(SS,#R1,#R2)/#Incr End),
cte1(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cte2(N) As (Select Top (Select M from cte0) Row_Number() over (Order By (Select NULL)) From cte1 a, cte1 b, cte1 c, cte1 d, cte1 e, cte1 f, cte1 g, cte1 h ),
cte3(N,D) As (Select 0,#R1 Union All Select N,Case #Part When 'YY' then DateAdd(YY, N*#Incr, #R1) When 'QQ' then DateAdd(QQ, N*#Incr, #R1) When 'MM' then DateAdd(MM, N*#Incr, #R1) When 'WK' then DateAdd(WK, N*#Incr, #R1) When 'DD' then DateAdd(DD, N*#Incr, #R1) When 'HH' then DateAdd(HH, N*#Incr, #R1) When 'MI' then DateAdd(MI, N*#Incr, #R1) When 'SS' then DateAdd(SS, N*#Incr, #R1) End From cte2 )
Select RetSeq = N+1
,RetVal = D
From cte3,cte0
Where D<=#R2
)
/*
Max 100 million observations -- Date Parts YY QQ MM WK DD HH MI SS
Syntax:
Select * from [dbo].[udf-Range-Date]('2016-10-01','2020-10-01','YY',1)
Select * from [dbo].[udf-Range-Date]('2016-01-01','2017-01-01','MM',1)
*/

OK, my understanding of the question is that what you want to do is count spans of days off as only one day. Many businesses call this an "occurrence of absence" to differentiate absences by cause. In this case, you're trying to treat holidays as a continuance of the holiday (for time purposes) and if a holiday occurs on a Friday but the person takes Monday off, that should be one contiguous time out.
Personally, I'd do this in C# because of properties of the DateTime object that could make this a lot easier than trying to make a frankenquery. The code below assumes that you have an object called an Employee that contains its own record of DateTimes, like so:
public class Employee
{
public int ID {get;set;}
public string Name {get;set;}
public List<DateTime> DaysIWasOut {get;set;}
}
public static int TimeOut(IEnumerable employees)
{
int totalOutInstances = 0;
DataTable dt = HolidaysPlease(); //this refers to another method
//to fill the table. Just a basic SQLAdapter.Fill kind of thing.
//Basic so I won't waste time on it here.
foreach(var e in employees)
{
var holidays = dt.AsEnumerable().Where(t => Convert.ToDateTime(t[3]) == d) //holidays now has all of the holidays the employee had off.
totalOutInstances = e.DaysIWasOut.Count();
foreach(var d in e.DaysIWasOut)
{
int daystolook = 0;
if (d.DayOfWeek == DayOfWeek.Friday)
daystolook +=3;
else
daystolook +=1;
if(e.DaysIWasOut.Contains(d.AddDays(daystolook))
{totalOutInstances --; } //don't count that day
}
}
return totalOutInstances;
}

Related

Remove need for second T-SQL query

I am loading some data into a repeater which is coming from two tables. The query against the second table is only selecting the MAX record though, and because of this complexity, I'm having to create a child repeater to then go off and find the Max record to display.
Table A: Activity List
ID | Activity
----+-----------------------
1 | Change Oil Filter
2 | Change brake fluid
3 | Change brake rotors
Table B: Mechanics Log
ID | ActivityID | Date | Mechanic | Comment
---+-------------+-------------+-------------------------------------------
1 | 1 | 2019-27-06 | John | Changed the oil filter
2 | 1 | 2019-26-06 | Sally | No oil filters in stock.
3 | 2 | 2019-20-06 | Sally | Brake fluid flushed.
As stated above, I can produce the following table using two repeaters (one inside the other) and it looks like this.
ActivityID | Date | Mechanic | Comment
-------------+-------------+-----------------------------------------
1 | 2019-27-06 | John | Changed the oil filter
2 | 2019-20-06 | Sally | Brake fluid flushed.
3 | | |
My question is: How can I produce the same table but using only one repeater and 1 T-SQL query? Is it possible? The reason being is that this is a very simple list (shortened for this demonstration) of the full list I have to enable for my mechanics work log, and when i start going to 100+ activities that can be done on a vehicle, the page loads quite slow; assuming because it has to fire off the 2nd repeater + code for each record it has bound.
I also apologize I do not yet have a 'starting point' for you to work with, as nothing I have created has come even close to producing the result in one query. I am having trouble working out how I combine the first part of the query with the MAX(Date) of the 2nd table. Hoping for some assistance from the community to help.
You can use the below query to get the desired result -
Sample Data
Declare #ActivityList Table
(ID int, Activity varchar(100))
Insert into #ActivityList
values
(1 , 'Change Oil Filter' ),
(2 , 'Change brake fluid' ),
(3 , 'Change brake rotors' )
Declare #MechanicsLog Table
(ID int, ActivityID int, [Date] Date, Mechanic varchar(20), Comment varchar(50))
Insert into #MechanicsLog
values
(1 , 1 , '2019-06-27' , 'John' , 'Changed the oil filter' ),
(2 , 1 , '2019-06-26' , 'Sally' , 'No oil filters in stock.' ),
(3 , 2 , '2019-06-20' , 'Sally' , 'Brake fluid flushed.' )
Query
;With cte as
(select ActivityID, Max([Date]) [date] from #MechanicsLog ml
Group By ActivityID
)
Select al.ID, al.Activity, cte.[Date], Mechanic, Comment
from cte inner join #MechanicsLog ml
on cte.ActivityID = ml.ActivityID and cte.[date] = ml.[Date]
right join #ActivityList al on al.ID = ml.ActivityID
order by ID
If you add use the ROW_NUMBER function to add a sequence to each activity ID, you can then filter that to only get the most recent for each activity ID.
select ActivityID, Date, Mechanic, Comment
from
(
select *, ROW_NUMBER() OVER (PARTITION BY ActivityID order by Date desc) RowNumber
from MechanicsLog
) q1
where RowNumber = 1
This gives you the "MAX" record for each ActivityID but with the rest of the record, so you can join to the Activity List table if you want.
select
act.ActivityID, Max(log.[Date]) as [Date]
from
ActivityList act
inner join
MachineLog log on log.ActivityID = act.ActivityID
Group by
act.ActivityID

How to compare the current data to the next data in the SqlDataReader

I have this table:
ID
00001
00001
00002
00002
00003
00004
00004
00004
00005
If the SqlDataReader reaches the 5th column (which is the 00003), is there a possibility that I can compare whether the next one has the same ID as the current value before finishing the current while loop?
I will be using this to determine whether the reader will move to the next different ID, so that I can draw the controls for the last part, then move on for the next ID.
I would love to post the code but since I'm creating elements dynamically, it is very long.
EDIT:
Here is the code (cleaned to make it simple as possible):
bool isFinished = false;
string lastID = "";
while (reader.Read())
{
string transID = Convert.ToInt32(reader["ID"]).ToString("D5");
lastID = (lastID == "") ? transID : lastID;
isFinished = (lastID != transID) ? true : false;
if (isFinished)
{
LastPart();
}
initialParts();
lastID = transID;
}
With this codes, I managed to put the LastPart() after all the data with similar ID has been created. But this results in the LastPart() not being called after the last initialParts()
Here ist a Sample in SQL. In the union ALL section you can pu your query:
SELECT *
FROM
(SELECT #my_id:=#my_last_id AS new_id , #my_last_id:=id AS next_id
FROM
( SELECT *
FROM
( SELECT *
FROM
( SELECT "00001" AS id
UNION ALL SELECT "00001"
UNION ALL SELECT "00002"
UNION ALL SELECT "00002"
UNION ALL SELECT "00003"
UNION ALL SELECT "00004"
UNION ALL SELECT "00004"
UNION ALL SELECT "00004"
UNION ALL SELECT "00005" ) AS yourTable
UNION ALL SELECT NULL AS id ) tablePlusOneRow) newTable
CROSS JOIN
(SELECT #my_id:=NULL, #my_next_id:=NULL) init) AS new_table2
WHERE new_id IS NOT NULL;
sample
MariaDB [sample]> SELECT *
-> FROM
-> (SELECT #my_id:=#my_last_id AS new_id , #my_last_id:=id AS next_id
-> FROM
-> ( SELECT *
-> FROM
-> ( SELECT *
-> FROM
-> ( SELECT "00001" AS id
-> UNION ALL SELECT "00001"
-> UNION ALL SELECT "00002"
-> UNION ALL SELECT "00002"
-> UNION ALL SELECT "00003"
-> UNION ALL SELECT "00004"
-> UNION ALL SELECT "00004"
-> UNION ALL SELECT "00004"
-> UNION ALL SELECT "00005" ) AS yourTable
-> UNION ALL SELECT NULL AS id ) tablePlusOneRow) newTable
-> CROSS JOIN
-> (SELECT #my_id:=NULL, #my_next_id:=NULL) init) AS new_table2
-> WHERE new_id IS NOT NULL;
+--------+---------+
| new_id | next_id |
+--------+---------+
| 00001 | 00001 |
| 00001 | 00002 |
| 00002 | 00002 |
| 00002 | 00003 |
| 00003 | 00004 |
| 00004 | 00004 |
| 00004 | 00004 |
| 00004 | 00005 |
| 00005 | NULL |
+--------+---------+
9 rows in set (0.00 sec)
MariaDB [sample]>
speed with 1000000 ROWS
The time includes also tho time for the output 0.28 / 0.79 sec
MariaDB [sample]> select * from myids;
.....
| 999998 |
| 999999 |
| 1000000 |
+---------+
1000000 rows in set (0.28 sec)
MariaDB [sample]>
SELECT * FROM (
SELECT #my_id:=#my_last_id AS new_id , #my_last_id:=id AS next_id
FROM (
SELECT *
FROM (
SELECT *
FROM (
SELECT * FROM myids
) AS yourTable
UNION ALL
SELECT NULL AS id
) tablePlusOneRow
) newTable
CROSS JOIN ( SELECT #my_id:=NULL, #my_next_id:=NULL) init
) AS new_table2
WHERE new_id IS NOT NULL;
....
| 999998 | 999999 |
| 999999 | 1000000 |
| 1000000 | NULL |
+---------+---------+
1000000 rows in set (0.79 sec)

Sales report in SQL Server 2008 merging same year into one?

I have the following query to display sales report in each year and month
SELECT
YEAR(orderDate) as SalesYear,
MONTH(orderDate) as SalesMonth,
SUM(Price) AS TotalSales
FROM Sales
GROUP BY YEAR(orderDate),MONTH(orderDate)
ORDER BY YEAR(orderDate), MONTH(orderDate)
output
2013 2 350.00
2013 5 350.00
2014 8 30.00
2014 11 30.00
2015 1 350.00
2015 8 120.00
But I need?
output like:
2013 2 700.00
2014 2 60.00
2015 2 470.00
Note: The month part should be the total number of months in each year.
Any help?
Thanks in Advance.
create table sales (orderdate date, price money);
insert into sales values
(N'2013-02-01', 350.00),
(N'2013-05-01', 350.00),
(N'2014-08-01', 30.00),
(N'2014-11-01', 30.00),
(N'2015-01-01', 350.00),
(N'2015-08-01', 120.00);
Alternatively, you could also use window functions SUM() OVER and COUNT() OVER to do this:
SELECT distinct
YEAR(orderDate) as SalesYear,
count(MONTH(orderDate)) OVER (PARTITION BY YEAR(OrderDate)) as SalesMonth,
SUM(Price) OVER (PARTITION BY YEAR(OrderDate)) AS TotalSales
FROM Sales
ORDER BY YEAR(orderDate);
Result:
+-----------+------------+------------+
| SalesYear | SalesMonth | TotalSales |
+-----------+------------+------------+
| 2013 | 2 | 700 |
| 2014 | 2 | 60 |
| 2015 | 2 | 470 |
+-----------+------------+------------+
Demo
Try:
SELECT
YEAR(orderDate) as SalesYear,
count(MONTH(orderDate)) as SalesMonth,
SUM(Price) AS TotalSales
FROM Sales
GROUP BY YEAR(orderDate)
ORDER BY YEAR(orderDate)
This is a fairly simple thing to do. Since you want the Count of months, just do that instead. The distinct is key for the month count unless you are just trying to count total orders, in which case you should just count Price and save processing time.
SELECT
YEAR(orderDate) as SalesYear,
Count(Distinct Month(orderDate)) as SalesMonth,
SUM(Price) AS TotalSales
FROM Sales
GROUP BY YEAR(orderDate)
ORDER BY YEAR(orderDate)
Looking at your output from your first query you need something like
SELECT YEAR, Count(Month), Sum(TotalSales)
FROM Sales
GROUP BY YEAR
so
SELECT
YEAR(orderDate) as SalesYear,
Count(MONTH(orderDate)) as SalesMonthCount,
SUM(Price) AS TotalSales
FROM Sales
GROUP BY YEAR(orderDate)
ORDER BY YEAR(orderDate)

combine multiple sql rows with different columns

Okay so say I have something like this:
ID | Name | Address
1 | Bob | 123 Fake Street
1 | Bob | 221 Other Street
done by doing something like:
select p.ID, p.Name a.Address from People p
inner join Addresses a on a.OwnerID = p.ID
Is there any way to turn that into
ID | Name | Address_1 | Address_2 | etc...
1 | Bob | 123 Fake Street | 221 Other street | etc
I've seen things that do comma separated values in one column but I don't want that I want distinct columns. I am querying this using MSSQL and C# I don't know if that changes anything. Also this is a made up scenario that is just similar to what I'm doing so the actual structure of the tables can't be changed.
Anyone have any suggestions?
You can use the PIVOT function to get the result but you will also have to implement using a row_number() so you can convert multiple addresses per person into columns.
If you had a known number of addresses, then you would hard-code the query:
select id, name, address_1, address_2
from
(
select p.id, p.name, a.address,
'Address_'+cast(row_number() over(partition by p.id
order by a.ownerid) as varchar(10)) rn
from people p
inner join addresses a
on p.id = a.ownerid
) d
pivot
(
max(address)
for rn in (address_1, address_2)
) piv;
See SQL Fiddle with Demo.
But if your case, you will have an unknown number of addresses per person so you will want to use dynamic SQL and place it into a stored procedure to execute:
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT distinct ',' + QUOTENAME('Address_'+d.rn)
from
(
select cast(row_number() over(partition by a.ownerid
order by a.ownerid) as varchar(10)) rn
from Addresses a
) d
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT id, name, ' + #cols + '
from
(
select p.id, p.name, a.address,
''Address_''+cast(row_number() over(partition by p.id
order by a.ownerid) as varchar(10)) rn
from people p
inner join addresses a
on p.id = a.ownerid
) d
pivot
(
max(address)
for rn in (' + #cols + ')
) p '
execute(#query);
See SQL Fiddle with Demo. These both give a result:
| ID | NAME | ADDRESS_1 | ADDRESS_2 | ADDRESS_3 |
----------------------------------------------------------------
| 1 | Bob | 123 Fake Street | 221 Other Street | (null) |
| 2 | Jim | 123 e main street | (null) | (null) |
| 3 | Tim | 489 North Drive | 56 June Street | 415 Lost |

Identify a specific sequence of records in a table

Assume a table with the fields TransactionId, ItemId, Code, EffectiveDate, and CreateDate.
+---------------+--------+------+------------------+------------------+
| TransactionId | ItemId | Code | EffectiveDate | CreateDate |
+---------------+--------+------+------------------+------------------+
| 1| 1| 8| 12/2/2009 1:13 PM| 12/2/2009 1:13 PM|
+---------------+--------+------+------------------+------------------+
| 4| 1| 51|12/2/2009 11:08 AM| 12/3/2009 9:01 AM|
+---------------+--------+------+------------------+------------------+
| 2| 1| 14|12/2/2009 11:09 AM|12/2/2009 11:09 AM|
+---------------+--------+------+------------------+------------------+
| 3| 1| 61| 12/3/2009 8:33 AM| 12/3/2009 8:33 AM|
+---------------+--------+------+------------------+------------------+
| 5| 1| 28| 12/3/2009 9:33 AM| 12/3/2009 9:33 AM|
+---------------+--------+------+------------------+------------------+
| 6| 1| 9| 12/3/2009 1:58 PM| 12/3/2009 1:58 PM|
+---------------+--------+------+------------------+------------------+
I need to get the set of records where the sequence 51, 61, 9 occurs for a given ItemId, sorted by EffectiveDate. There could be other records with other codes in between these records.
In this case, I would return TransactionId's 4, 3, and 6, as shown below.
+---------------+--------+------+------------------+------------------+
| TransactionId | ItemId | Code | EffectiveDate | CreateDate |
+---------------+--------+------+------------------+------------------+
| 4| 1| 51|12/2/2009 11:08 AM| 12/3/2009 9:01 AM|
+---------------+--------+------+------------------+------------------+
| 3| 1| 61| 12/3/2009 8:33 AM| 12/3/2009 8:33 AM|
+---------------+--------+------+------------------+------------------+
| 6| 1| 9| 12/3/2009 1:58 PM| 12/3/2009 1:58 PM|
+---------------+--------+------+------------------+------------------+
Note that:
This isn't the only sequence I'll need to identify, but it illustrates the problem.
Records can be inserted into the table out of order; that is, the 61 record could have been inserted first, followed by the 51, and then the 9. You can see this in the example where for the code 51 record the CreateDate is later than the EffectiveDate.
The order of the sequence matters. So, the sequence 61, 9, 51 would not return any records, but 51, 61, 9 would.
A DB approach is ideal if it's simple (i.e. no cursors or overly complicated stored procedure), but a code approach could also work, although it would result in a significant amount of data transfer out of the DB.
The environment is SQL Server 2005 and C#/.NET 3.5.
Actually, you could get a couple of fairly simple solutions leveraging ranking/windowing functions and/or CTEs and recursive CTEs.
Create a procedure that accepts a character-based comma-separated list of Code values you are looking for in the sequence you want them in - use any of a dozen possible ways to split this list into a table/set that is made up of the sequence and Code value, resulting in a table with a structure like this:
declare #sequence table (sequence int not null, Code int not null);
Once you have this, it's simply a matter of sequencing the source set based on joining the sequenced table to the source table on the same Code values for a given ItemId - once you have the source set filtered and sequenced, you can simply join again based on the matching sequence values - this is sounding complex, but in reality it would be a single query like this:
with srcData as (
select row_number() over(order by t.EffectiveDate) as rn,
t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate
from #TableName t
join #sequence s
on t.Code = s.Code
where t.ItemId = #item_id
)
select d.TransactionId, d.ItemId, d.Code, d.EffectiveDate, d.CreateDate
from srcData d
join #sequence s
on d.rn = s.sequence
and d.Code = s.Code
order by d.rn;
This alone won't guarantee that you get a result-set that is identical to what you are looking for, but staging the data into a temp table and adding a few simple checks around the code would do the trick (for example, add a checksum validation and a sum of the code values)
declare #tempData table (rn int, TransactionId smallint, ItemId smallint, Code smallint, EffectiveDate datetime, CreateDate datetime);
with srcData as (
select row_number() over(order by t.EffectiveDate) as rn,
t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate
from #TableName t
join #sequence s
on t.Code = s.Code
where t.ItemId = #item_id
)
insert #tempData
(rn, TransactionId, ItemId, Code, EffectiveDate, CreateDate)
select d.rn, d.TransactionId, d.ItemId, d.Code, d.EffectiveDate, d.CreateDate
from srcData d
join #sequence s
on d.rn = s.sequence
and d.Code = s.Code;
-- Verify we have matching hash/sums
if
(
( (select sum(Code) from #sequence) = (select sum(Code) from #tempData) )
and
( (select checksum_agg(checksum(sequence, Code)) from #sequence) = (select checksum_agg(checksum(rn, Code)) from #tempData) )
)
begin;
-- Match - return the resultset
select d.TransactionId, d.ItemId, d.Code, d.EffectiveDate, d.CreateDate
from #tempData d
order by d.rn;
end;
If you want to do it all inline, you could use a different approach leveraging CTEs and recursion to perform a running sum/total and OrdPath-like comparison as well (though you'd still need to parse the sequence character data out into a dataset)
-- Sequence data with running total
with sequenceWithRunningTotal as
(
-- Anchor
select s.sequence, s.Code, s.Code as runningTotal, cast(s.Code as varchar(8000)) as pth,
sum(s.Code) over(partition by 1) as sumCode
from #sequence s
where s.sequence = 1
-- Recurse
union all
select s.sequence, s.Code, b.runningTotal + s.Code as runningTotal,
b.pth + '.' + cast(s.Code as varchar(8000)) as pth,
b.sumCode as sumCode
from #sequence s
join sequenceWithRunningTotal b
on s.sequence = b.sequence + 1
),
-- Source data with sequence value
srcData as
(
select row_number() over(order by t.EffectiveDate) as rn,
t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate,
sum(t.Code) over(partition by 1) as sumCode
from #TableName t
join #sequence s
on t.Code = s.Code
where t.ItemId = #item_id
),
-- Source data with running sum
sourceWithRunningSum as
(
-- Anchor
select t.rn, t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate,
t.Code as runningTotal, cast(t.Code as varchar(8000)) as pth,
t.sumCode
from srcData t
where t.rn = 1
-- Recurse
union all
select t.rn, t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate,
s.runningTotal + t.Code as runningTotal,
s.pth + '.' + cast(t.Code as varchar(8000)) as pth,
t.sumCode
from srcData t
join sourceWithRunningSum s
on t.rn = s.rn + 1
)
select d.TransactionId, d.ItemId, d.Code, d.EffectiveDate, d.CreateDate
from sourceWithRunningSum d
join sequenceWithRunningTotal s
on d.rn = s.sequence
and d.Code = s.Code
and d.runningTotal = s.runningTotal
and d.pth = s.pth
and d.sumCode = s.sumCode
order by d.rn;
A DB approach is ideal if it's simple (i.e. no cursors or overly complicated stored procedure)
I don't believe a pure DB approach ("pure" meaning only using SQL SELECT) is practical because the type of SQL I envision would require very convoluted self-joins, field concatenation, MAX() functions, etc. This type of SQL might be a fun academic answer to a puzzle in Joe Celko's "SQL for Smarties" book but I don't think that's appropriate for production code.
I think the realistic approach is to write some kind of loop that keeps track of state. Your problem in the general sense is very similar to writing code for stateful inspection of TCPIP packets for spam filtering or scanning credit-card transactions for fraudulent patterns. All these problems share similar characteristics: the actions you take on the current row(record) depends on what records you saw previously (the context)... and that aspect requires holding state variables.
If you want to avoid round-tripping the data for analysis, it looks like Transact-SQL is the best way for performance. Or use hosted CLR to take advantage of C# syntax while still keeping the processing within the database engine.
This is just off the top of my head and is untested, so it may need some tweaking:
SELECT DISTINCT
T.TransactionID,
T.ItemID,
T.Code,
T.EffectiveDate,
T.CreateDate
FROM
My_Table T
INNER JOIN (
SELECT
T1.TransactionID,
T2.TransactionID,
T3.TransactionID
FROM
My_Table T1
INNER JOIN My_Table T2 ON
T2.ItemID = T1.ItemID AND
T2.Code = 61 AND
T2.EffectiveDate > T1.EffectiveDate
INNER JOIN My_Table T3 ON
T3.ItemID = T1.ItemID AND
T3.Code = 9 AND
T3.EffectiveDate > T2.EffectiveDate
WHERE
T1.Code = 51
) SQ ON
SQ.TransactionID = T1.TransactionID OR
SQ.TransactionID = T2.TransactionID OR
SQ.TransactionID = T3.TransactionID

Categories