Identify a specific sequence of records in a table - c#

Assume a table with the fields TransactionId, ItemId, Code, EffectiveDate, and CreateDate.
+---------------+--------+------+------------------+------------------+
| TransactionId | ItemId | Code | EffectiveDate | CreateDate |
+---------------+--------+------+------------------+------------------+
| 1| 1| 8| 12/2/2009 1:13 PM| 12/2/2009 1:13 PM|
+---------------+--------+------+------------------+------------------+
| 4| 1| 51|12/2/2009 11:08 AM| 12/3/2009 9:01 AM|
+---------------+--------+------+------------------+------------------+
| 2| 1| 14|12/2/2009 11:09 AM|12/2/2009 11:09 AM|
+---------------+--------+------+------------------+------------------+
| 3| 1| 61| 12/3/2009 8:33 AM| 12/3/2009 8:33 AM|
+---------------+--------+------+------------------+------------------+
| 5| 1| 28| 12/3/2009 9:33 AM| 12/3/2009 9:33 AM|
+---------------+--------+------+------------------+------------------+
| 6| 1| 9| 12/3/2009 1:58 PM| 12/3/2009 1:58 PM|
+---------------+--------+------+------------------+------------------+
I need to get the set of records where the sequence 51, 61, 9 occurs for a given ItemId, sorted by EffectiveDate. There could be other records with other codes in between these records.
In this case, I would return TransactionId's 4, 3, and 6, as shown below.
+---------------+--------+------+------------------+------------------+
| TransactionId | ItemId | Code | EffectiveDate | CreateDate |
+---------------+--------+------+------------------+------------------+
| 4| 1| 51|12/2/2009 11:08 AM| 12/3/2009 9:01 AM|
+---------------+--------+------+------------------+------------------+
| 3| 1| 61| 12/3/2009 8:33 AM| 12/3/2009 8:33 AM|
+---------------+--------+------+------------------+------------------+
| 6| 1| 9| 12/3/2009 1:58 PM| 12/3/2009 1:58 PM|
+---------------+--------+------+------------------+------------------+
Note that:
This isn't the only sequence I'll need to identify, but it illustrates the problem.
Records can be inserted into the table out of order; that is, the 61 record could have been inserted first, followed by the 51, and then the 9. You can see this in the example where for the code 51 record the CreateDate is later than the EffectiveDate.
The order of the sequence matters. So, the sequence 61, 9, 51 would not return any records, but 51, 61, 9 would.
A DB approach is ideal if it's simple (i.e. no cursors or overly complicated stored procedure), but a code approach could also work, although it would result in a significant amount of data transfer out of the DB.
The environment is SQL Server 2005 and C#/.NET 3.5.

Actually, you could get a couple of fairly simple solutions leveraging ranking/windowing functions and/or CTEs and recursive CTEs.
Create a procedure that accepts a character-based comma-separated list of Code values you are looking for in the sequence you want them in - use any of a dozen possible ways to split this list into a table/set that is made up of the sequence and Code value, resulting in a table with a structure like this:
declare #sequence table (sequence int not null, Code int not null);
Once you have this, it's simply a matter of sequencing the source set based on joining the sequenced table to the source table on the same Code values for a given ItemId - once you have the source set filtered and sequenced, you can simply join again based on the matching sequence values - this is sounding complex, but in reality it would be a single query like this:
with srcData as (
select row_number() over(order by t.EffectiveDate) as rn,
t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate
from #TableName t
join #sequence s
on t.Code = s.Code
where t.ItemId = #item_id
)
select d.TransactionId, d.ItemId, d.Code, d.EffectiveDate, d.CreateDate
from srcData d
join #sequence s
on d.rn = s.sequence
and d.Code = s.Code
order by d.rn;
This alone won't guarantee that you get a result-set that is identical to what you are looking for, but staging the data into a temp table and adding a few simple checks around the code would do the trick (for example, add a checksum validation and a sum of the code values)
declare #tempData table (rn int, TransactionId smallint, ItemId smallint, Code smallint, EffectiveDate datetime, CreateDate datetime);
with srcData as (
select row_number() over(order by t.EffectiveDate) as rn,
t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate
from #TableName t
join #sequence s
on t.Code = s.Code
where t.ItemId = #item_id
)
insert #tempData
(rn, TransactionId, ItemId, Code, EffectiveDate, CreateDate)
select d.rn, d.TransactionId, d.ItemId, d.Code, d.EffectiveDate, d.CreateDate
from srcData d
join #sequence s
on d.rn = s.sequence
and d.Code = s.Code;
-- Verify we have matching hash/sums
if
(
( (select sum(Code) from #sequence) = (select sum(Code) from #tempData) )
and
( (select checksum_agg(checksum(sequence, Code)) from #sequence) = (select checksum_agg(checksum(rn, Code)) from #tempData) )
)
begin;
-- Match - return the resultset
select d.TransactionId, d.ItemId, d.Code, d.EffectiveDate, d.CreateDate
from #tempData d
order by d.rn;
end;
If you want to do it all inline, you could use a different approach leveraging CTEs and recursion to perform a running sum/total and OrdPath-like comparison as well (though you'd still need to parse the sequence character data out into a dataset)
-- Sequence data with running total
with sequenceWithRunningTotal as
(
-- Anchor
select s.sequence, s.Code, s.Code as runningTotal, cast(s.Code as varchar(8000)) as pth,
sum(s.Code) over(partition by 1) as sumCode
from #sequence s
where s.sequence = 1
-- Recurse
union all
select s.sequence, s.Code, b.runningTotal + s.Code as runningTotal,
b.pth + '.' + cast(s.Code as varchar(8000)) as pth,
b.sumCode as sumCode
from #sequence s
join sequenceWithRunningTotal b
on s.sequence = b.sequence + 1
),
-- Source data with sequence value
srcData as
(
select row_number() over(order by t.EffectiveDate) as rn,
t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate,
sum(t.Code) over(partition by 1) as sumCode
from #TableName t
join #sequence s
on t.Code = s.Code
where t.ItemId = #item_id
),
-- Source data with running sum
sourceWithRunningSum as
(
-- Anchor
select t.rn, t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate,
t.Code as runningTotal, cast(t.Code as varchar(8000)) as pth,
t.sumCode
from srcData t
where t.rn = 1
-- Recurse
union all
select t.rn, t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate,
s.runningTotal + t.Code as runningTotal,
s.pth + '.' + cast(t.Code as varchar(8000)) as pth,
t.sumCode
from srcData t
join sourceWithRunningSum s
on t.rn = s.rn + 1
)
select d.TransactionId, d.ItemId, d.Code, d.EffectiveDate, d.CreateDate
from sourceWithRunningSum d
join sequenceWithRunningTotal s
on d.rn = s.sequence
and d.Code = s.Code
and d.runningTotal = s.runningTotal
and d.pth = s.pth
and d.sumCode = s.sumCode
order by d.rn;

A DB approach is ideal if it's simple (i.e. no cursors or overly complicated stored procedure)
I don't believe a pure DB approach ("pure" meaning only using SQL SELECT) is practical because the type of SQL I envision would require very convoluted self-joins, field concatenation, MAX() functions, etc. This type of SQL might be a fun academic answer to a puzzle in Joe Celko's "SQL for Smarties" book but I don't think that's appropriate for production code.
I think the realistic approach is to write some kind of loop that keeps track of state. Your problem in the general sense is very similar to writing code for stateful inspection of TCPIP packets for spam filtering or scanning credit-card transactions for fraudulent patterns. All these problems share similar characteristics: the actions you take on the current row(record) depends on what records you saw previously (the context)... and that aspect requires holding state variables.
If you want to avoid round-tripping the data for analysis, it looks like Transact-SQL is the best way for performance. Or use hosted CLR to take advantage of C# syntax while still keeping the processing within the database engine.

This is just off the top of my head and is untested, so it may need some tweaking:
SELECT DISTINCT
T.TransactionID,
T.ItemID,
T.Code,
T.EffectiveDate,
T.CreateDate
FROM
My_Table T
INNER JOIN (
SELECT
T1.TransactionID,
T2.TransactionID,
T3.TransactionID
FROM
My_Table T1
INNER JOIN My_Table T2 ON
T2.ItemID = T1.ItemID AND
T2.Code = 61 AND
T2.EffectiveDate > T1.EffectiveDate
INNER JOIN My_Table T3 ON
T3.ItemID = T1.ItemID AND
T3.Code = 9 AND
T3.EffectiveDate > T2.EffectiveDate
WHERE
T1.Code = 51
) SQ ON
SQ.TransactionID = T1.TransactionID OR
SQ.TransactionID = T2.TransactionID OR
SQ.TransactionID = T3.TransactionID

Related

Best way to calculate the number of element that NOT appear in another table

Consider the following situation
TableA TableB
+------------+----------+----------+----------+ +---------+
|Column1 | Column2 | Column3 | Column4 | | entB |
+-----------------------+----------+----------+ +---------+
| zzzxxxx | NULL | NULL | zzzyyyy | | zzzxxxx |
+------------+----------+----------+----------+ +---------+
| zzzqqqq | NULL | SomeText | NULL | | zzzyyyy |
+------------+----------+----------+----------+ +---------+
| NULL | zzzxxxx | zzzxxx | NULL | | zzzwwww |
+------------+----------+----------+----------+ +---------+
| zzzyyyy | zzzyyyy | zzzwwww | SomeText |
+------------+----------+----------+----------+
where z,y,x = 1-9 and some text can contain any number or letter- Table B can't have null values in ent B column.
i need to find the total of the values that are in Table A that are not in Table B. The column in table A does not contain unique(distinct) value and may be null.
My first attempt was the following and query
$"select count(1) from " +
$"(" +
$" select distinct Column1 from {TableA} where Column1 not in (select entB from {TableB})" +
$" union" +
$" select distinct Column2 from {TableA} where Column2 not in (select entB from {TableB})" +
$" union" +
$" select distinct Column3 from {TableA} where Column3 not in (select entB from {TableB})" +
$" union" +
$" select distinct Column4 from {TableA} where Column4 not in (select entB from {TableB})" +
$") as t"
This was fine until i had to test this on a TableA with ~70000000 rows and ~100000 rows for TableB where this query took way too long to be executed. I am looking for a way to decrease the time.
I read that using Distinct and Union is an easy way to kill the performance, so i was thinking to try something like this
SELECT Column1
FROM TableA a
WHERE NOT EXISTS (SELECT 1 FROM TableB b WHERE a.Column1 = b.entB and a.Column1 is not null )
and Column1 is not null)
get the result, save it in a DataTable, then repeat the same query for the other 3 column and merge the result checking the duplicates in memory.
Do you know if there are better solution?
EDIT: i have edited the table to show better how my data looks like. In the example, i expect as result the value "2" since there are 2 values (SomeText and zzzqqqq) that are not present in TableB
Once we get past all the grumbling about how TableA isn't normalized, this isn't hard to do.
I guess you want a count of the values of your four columns in TableA that don't match TableB. If you want something more complex, with respect, take the time to figure out how to describe it very precisely.
Start with a subquery that gives you the values from TableA to compare. Because we use UNION rather than UNION ALL, we get SELECT DISTINCT for free. (SQL manipulates sets.)
SELECT Column1 AS ent FROM TableA
UNION
SELECT Column2 AS ent FROM TableA
UNION
SELECT Column3 AS ent FROM TableA
UNION
SELECT Column4 AS ent FROM TableA
Then, use the LEFT JOIN .... IS NULL pattern to get the items that don't match.
SELECT COUNT(*) number_of_unmatched_items
FROM ( SELECT Column1 AS ent FROM TableA
UNION
SELECT Column2 AS ent FROM TableA
UNION
SELECT Column3 AS ent FROM TableA
UNION
SELECT Column4 AS ent FROM TableA
) a
LEFT JOIN TableB b ON a.ent = b.entB
WHERE b.entB IS NULL
That WHERE...IS NULL picks up the rows from your subquery that failed the ON condition in the left join.
To make this decently fast, I think you will need separate indexes on each column from TableA that's involved in this, as well as an index on entB in TableB. But you'll need to try it out, and do EXPLAIN if it still doesn't meet your performance needs.
Unless the machine running MySQL is really short on RAM, MySQL should handle this stuff reasonably efficiently.
Pro tip: You already know this. Denormalized tables like TableA can really mess up query performance.

function set table name as parameter and get columns name and table name from XML file in sqlcommand c# [duplicate]

Here is my SQL query below. I want to select values from the column names given as variables. Is there any appropriate way of doing this except using a dynamic query?
SELECT EPV.EmployeeCode, #RateOfEmployee, #RateOfEmployer
FROM [HR_EmployeeProvisions] EPV
One way to do this without using dynamic sql is using CASE statement
But this is ugly
SELECT EPV.EmployeeCode, case #RateOfEmployee when 'RateOfEmployee' then RateOfEmployee
when 'X' then X
..
end , case #RateOfEmployer when 'RateOfEmployer' then RateOfEmployer
when 'Y' then Y
..
end
FROM [HR_EmployeeProvisions] EPV
You have to check all the column's in CASE statement.
You can't parameterize identifiers in Sql server, and I doubt it's possible in any other relational database.
Your best choice is to use dynamic Sql.
Note that dynamic sql is very often a security hazard and you must defend your code from sql injection attacks.
I would probably do something like this:
Declare #Sql nvarchar(500)
Declare numberOfColumns int;
select #numberOfColumns = count(1)
from information_schema.columns
where table_name = 'HR_EmployeeProvisions'
and column_name IN(#RateOfEmployee, #RateOfEmployer)
if #numberOfColumns = 2 begin
Select #Sql = 'SELECT EmployeeCode, '+ QUOTENAME(#RateOfEmployee) +' ,'+ QUOTENAME(#RateOfEmployer) +
'FROM HR_EmployeeProvisions'
exec(#Sql)
end
This way you make sure that the column names actually exists in the table, as well as using QUOTENAME as another layer of safety.
Note: in your presentation layer you should handle the option that the select will not be performed since the column names are invalid.
Have a look at UNPIVOT clause - I'm not sure it is applicable for your case but in some circumstances it can be used to query a value by the column name without dynamic SQL:
create table t1 (
a int,
b int,
c int
);
insert into t1 values
(1, 11, 111),
(2, 22, 222),
(3, 33, 333);
select a, col_name, col_value from t1
unpivot (col_value for col_name in (b, c)) as dt;
Result:
| a | col_name | col_value |
|---|----------|-----------|
| 1 | b | 11 |
| 1 | c | 111 |
| 2 | b | 22 |
| 2 | c | 222 |
| 3 | b | 33 |
| 3 | c | 333 |
(SQL Fiddle)
If you only need a value in a depending on some condition on (dynamically) either b or c, you can build the condition on that. If you need either values in column b or c, you can add ... WHERE col_name = ?. If you need more columns, you'd probably need to filter the column values on the un-pivoted table than pivot it again to get the values back in columns.

Count Consecutive vacation days skip through holidays and weekends

I have a table which has records of user's vacation days.
A Sample of that would be:
+---------+-----------+---------+------------+
| country | user_name | user_id | vac_date |
+---------+-----------+---------+------------+
| canada | James | 1111 | 2015-02-13 |
| canada | James | 1111 | 2015-02-17 |
| canada | James | 1111 | 2015-02-18 |
| canada | James | 1111 | 2015-02-10 |
| canada | James | 1111 | 2015-02-11 |
+---------+-----------+---------+------------+
With the above data, the count would be 3 from feb 13th to feb 18th, because 14th and 15th are weekends and the 16th is a holiday here in Canada. So essentially, I am trying to hold and continue the count if the user took the next working day off. I also have a table that has all the holidays which includes the country and the date of the holiday. Sample data for the holiday table would be:
+---------+-------------+-------------+
| country | holidayDesc | holidayDate |
+---------+-------------+-------------+
| canada | Family Day | 2015-02-16 |
+---------+-------------+-------------+
Currently i have a query in SQL that counts the the dates normally, so it only counts whatever is in the vacation table. For example: if a user took march 3rd 2015, march 4th 2015, and march 5th 2015 off, then it will have a count of 3, but for that above table example, it would only have a count of 1 for feb 13th and 2 from feb 17th to feb 18th.
SELECT DISTINCT user_name
,min(vac_date) as startDate
,max(vac_date) as endDate
,datediff(day, min(vac_date), max(vac_date)) as consecutiveCount
FROM (
SELECT user_name
,vac_date
,user_id
,groupDate = DATEADD(DAY, - ROW_NUMBER() OVER (
PARTITION BY user_id ORDER BY vac_date
), vac_date)
FROM mytable
WHERE country = 'canada'
AND vac_date BETWEEN '20150101'
AND '20151231'
) z
GROUP BY user_name
,groupDate
HAVING datediff(day, min(vac_date), max(vac_date)) >= 0
ORDER BY user_name
,min(vac_date);
This is what it currently outputs from the above sample data:
+-----------+------------+------------+------------------+
| user_name | startDate | endDate | consecutiveCount |
+-----------+------------+------------+------------------+
| James | 2015-02-10 | 2015-02-11 | 2 |
| James | 2015-02-13 | 2015-02-13 | 1 |
| James | 2015-02-17 | 2015-02-18 | 2 |
+-----------+------------+------------+------------------+
Ideally i would like it to be:
+-----------+------------+------------+------------------+
| user_name | startDate | endDate | consecutiveCount |
+-----------+------------+------------+------------------+
| James | 2015-02-10 | 2015-02-11 | 2 |
| James | 2015-02-13 | 2015-02-18 | 3 |
+-----------+------------+------------+------------------+
But i don't know if that is possible with pure SQL. I can also try to incorporate it into C#.
If it helps I am also using C# and SQL Server Management Studio. Any help would be appreciated. Thanks in advance
I try to go a different route, but then found the fix for John Cappelletti solution.
First you need to add weekend dates to your holiday table.
Get a list of dates between two dates using a function
Then UNION ALL vacation days with holidays, but add a description field so you can difference between both.
There are some CROSS JOIN so you can have holiday and weekends for each country and user (need testing)
SELECT [country],
[user_name], [user_id], [vac_date], 'vacation' as description
FROM vacations
UNION ALL
SELECT c.[country],
u.[user_name],
u.[user_id],
[holidayDate],
'holiday' as description
FROM holidays
CROSS JOIN (SELECT DISTINCT [country] FROM vacations) c
CROSS JOIN (SELECT DISTINCT [user_name], [user_id] FROM vacations) u
Then the final query is the same as John suggested, but this time you only count vacation days.
WITH joinDates as (
SELECT [country],
[user_name], [user_id], [vac_date], 'vacation' as description
FROM vacations
UNION ALL
SELECT c.[country],
u.[user_name],
u.[user_id],
[holidayDate],
'holiday' as description
FROM holidays
CROSS JOIN (SELECT DISTINCT [country] FROM vacations) c
CROSS JOIN (SELECT DISTINCT [user_name], [user_id] FROM vacations) u
)
Select user_name
,startDate = min(vac_date)
,endDate = max(vac_date)
,consecutiveCount = count(*)
From (
Select *
,Grp = Day(vac_date) - Row_Number() over (Partition By country,user_id
Order by vac_date)
From joinDates S
) A
WHERE description = 'vacation' -- only count vacation days ignore holiday/weekend
Group By user_name, Grp
Having count(*)>1
ORDER BY startDate
SQL DEMO
OUTPUT
RAW OUTPUT
here you can see the data before the group by
This seems like a classic Gaps & Islands with a little twist.
Declare #YourTable table (country varchar(25),user_name varchar(25),user_id varchar(25),vac_date date)
Insert Into #YourTable values
('canada','James','1111','2015-02-13'),
('canada','James','1111','2015-02-17'),
('canada','James','1111','2015-02-18'),
('canada','James','1111','2015-02-10'),
('canada','James','1111','2015-02-11')
Declare #Holiday table (country varchar(25),holidayDate date)
Insert Into #Holiday values
('canada','2015-02-16')
Select user_name
,startDate = min(vac_date)
,endDate = max(vac_date)
,consecutiveCount = sum(DayCnt)
From (
Select *
,Grp = Day(vac_date) - Row_Number() over (Partition By country,user_id Order by vac_date)
From (Select Country,user_name,user_id,vac_date,DayCnt=1 from #YourTable
Union All
Select A.Country,user_name,user_id,vac_date=b.holidayDate,DayCnt=1
From #YourTable A
Join #Holiday B on A.country=B.country and abs(DateDiff(DD,vac_date,holidayDate))=1
Union All
Select A.Country,user_name,user_id,vac_date=b.retval,DayCnt=0
From #YourTable A
Join (
Select * From [dbo].[udf-Range-Date]('2015-01-01','2017-12-31','DD',1) where DateName(WEEKDAY,RetVal) in ('Saturday','Sunday')
) B on abs(DateDiff(DD,vac_date,RetVal))=1
) S
) A
Group By user_name,Grp
Having Sum(DayCnt)>1
Returns
user_name startDate endDate consecutiveCount
James 2015-02-10 2015-02-11 2
James 2015-02-16 2015-02-18 3
The UDF to generate dynamic Date Ranges -- could be your own query
CREATE FUNCTION [dbo].[udf-Range-Date] (#R1 datetime,#R2 datetime,#Part varchar(10),#Incr int)
Returns Table
Return (
with cte0(M) As (Select 1+Case #Part When 'YY' then DateDiff(YY,#R1,#R2)/#Incr When 'QQ' then DateDiff(QQ,#R1,#R2)/#Incr When 'MM' then DateDiff(MM,#R1,#R2)/#Incr When 'WK' then DateDiff(WK,#R1,#R2)/#Incr When 'DD' then DateDiff(DD,#R1,#R2)/#Incr When 'HH' then DateDiff(HH,#R1,#R2)/#Incr When 'MI' then DateDiff(MI,#R1,#R2)/#Incr When 'SS' then DateDiff(SS,#R1,#R2)/#Incr End),
cte1(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cte2(N) As (Select Top (Select M from cte0) Row_Number() over (Order By (Select NULL)) From cte1 a, cte1 b, cte1 c, cte1 d, cte1 e, cte1 f, cte1 g, cte1 h ),
cte3(N,D) As (Select 0,#R1 Union All Select N,Case #Part When 'YY' then DateAdd(YY, N*#Incr, #R1) When 'QQ' then DateAdd(QQ, N*#Incr, #R1) When 'MM' then DateAdd(MM, N*#Incr, #R1) When 'WK' then DateAdd(WK, N*#Incr, #R1) When 'DD' then DateAdd(DD, N*#Incr, #R1) When 'HH' then DateAdd(HH, N*#Incr, #R1) When 'MI' then DateAdd(MI, N*#Incr, #R1) When 'SS' then DateAdd(SS, N*#Incr, #R1) End From cte2 )
Select RetSeq = N+1
,RetVal = D
From cte3,cte0
Where D<=#R2
)
/*
Max 100 million observations -- Date Parts YY QQ MM WK DD HH MI SS
Syntax:
Select * from [dbo].[udf-Range-Date]('2016-10-01','2020-10-01','YY',1)
Select * from [dbo].[udf-Range-Date]('2016-01-01','2017-01-01','MM',1)
*/
OK, my understanding of the question is that what you want to do is count spans of days off as only one day. Many businesses call this an "occurrence of absence" to differentiate absences by cause. In this case, you're trying to treat holidays as a continuance of the holiday (for time purposes) and if a holiday occurs on a Friday but the person takes Monday off, that should be one contiguous time out.
Personally, I'd do this in C# because of properties of the DateTime object that could make this a lot easier than trying to make a frankenquery. The code below assumes that you have an object called an Employee that contains its own record of DateTimes, like so:
public class Employee
{
public int ID {get;set;}
public string Name {get;set;}
public List<DateTime> DaysIWasOut {get;set;}
}
public static int TimeOut(IEnumerable employees)
{
int totalOutInstances = 0;
DataTable dt = HolidaysPlease(); //this refers to another method
//to fill the table. Just a basic SQLAdapter.Fill kind of thing.
//Basic so I won't waste time on it here.
foreach(var e in employees)
{
var holidays = dt.AsEnumerable().Where(t => Convert.ToDateTime(t[3]) == d) //holidays now has all of the holidays the employee had off.
totalOutInstances = e.DaysIWasOut.Count();
foreach(var d in e.DaysIWasOut)
{
int daystolook = 0;
if (d.DayOfWeek == DayOfWeek.Friday)
daystolook +=3;
else
daystolook +=1;
if(e.DaysIWasOut.Contains(d.AddDays(daystolook))
{totalOutInstances --; } //don't count that day
}
}
return totalOutInstances;
}

combine multiple sql rows with different columns

Okay so say I have something like this:
ID | Name | Address
1 | Bob | 123 Fake Street
1 | Bob | 221 Other Street
done by doing something like:
select p.ID, p.Name a.Address from People p
inner join Addresses a on a.OwnerID = p.ID
Is there any way to turn that into
ID | Name | Address_1 | Address_2 | etc...
1 | Bob | 123 Fake Street | 221 Other street | etc
I've seen things that do comma separated values in one column but I don't want that I want distinct columns. I am querying this using MSSQL and C# I don't know if that changes anything. Also this is a made up scenario that is just similar to what I'm doing so the actual structure of the tables can't be changed.
Anyone have any suggestions?
You can use the PIVOT function to get the result but you will also have to implement using a row_number() so you can convert multiple addresses per person into columns.
If you had a known number of addresses, then you would hard-code the query:
select id, name, address_1, address_2
from
(
select p.id, p.name, a.address,
'Address_'+cast(row_number() over(partition by p.id
order by a.ownerid) as varchar(10)) rn
from people p
inner join addresses a
on p.id = a.ownerid
) d
pivot
(
max(address)
for rn in (address_1, address_2)
) piv;
See SQL Fiddle with Demo.
But if your case, you will have an unknown number of addresses per person so you will want to use dynamic SQL and place it into a stored procedure to execute:
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT distinct ',' + QUOTENAME('Address_'+d.rn)
from
(
select cast(row_number() over(partition by a.ownerid
order by a.ownerid) as varchar(10)) rn
from Addresses a
) d
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT id, name, ' + #cols + '
from
(
select p.id, p.name, a.address,
''Address_''+cast(row_number() over(partition by p.id
order by a.ownerid) as varchar(10)) rn
from people p
inner join addresses a
on p.id = a.ownerid
) d
pivot
(
max(address)
for rn in (' + #cols + ')
) p '
execute(#query);
See SQL Fiddle with Demo. These both give a result:
| ID | NAME | ADDRESS_1 | ADDRESS_2 | ADDRESS_3 |
----------------------------------------------------------------
| 1 | Bob | 123 Fake Street | 221 Other Street | (null) |
| 2 | Jim | 123 e main street | (null) | (null) |
| 3 | Tim | 489 North Drive | 56 June Street | 415 Lost |

Dynamically Pivot unknown Data

Trying to pivot dynamic data using LINQ or LAMBDA in C#/MVC4 and have pretty much come to the conclusion that its very difficult to do..
This is basically what I want to do:
I have been able to get this to work with known column names using this example: http://geekswithblogs.net/malisancube/archive/2009/04/21/using-lambda-or-linq-for-pivot-tables.aspx
But I cant find any examples for doing this with dynamic columns.
By dynamic columns I mean that there could be a new row with a different Name and FieldType that has not been in the table before at any time that also needs to be turned into a column.. any pointers?.
I don't know LINQ so I will give you a version that can be used in a SQL Server stored procedure. This type of data transformation is known as a PIVOT. Since you are using SQL Server 2008+, you can use the function.
If you know the values that you want to transform, then you can hard-code the values:
SELECT nodeid, rowid,[FirstName], [LastName], [Title]
FROM
(
SELECT nodeid, rowid, name, value
FROM yourTable
) x
PIVOT
(
max(value)
for name in ([FirstName], [LastName], [Title])
)p
See SQL Fiddle with Demo
Then if you have an unknown number of values, you can implement dynamic SQL:
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT distinct ',' + QUOTENAME(name)
from yourtable
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT nodeid, rowid,' + #cols + ' from
(
SELECT nodeid, rowid, name, value
FROM yourTable
) x
pivot
(
max(value)
for name in (' + #cols + ')
) p '
execute(#query)
See SQL Fiddle with Demo
Both return the results:
| NODEID | ROWID | FIRSTNAME | LASTNAME | TITLE |
--------------------------------------------------
| 1 | 1 | Alfred | Beagle | (null) |
| 1 | 2 | Freddy | (null) | (null) |
| 1 | 3 | (null) | Grey | Sir. |

Categories