Consider the following situation
TableA TableB
+------------+----------+----------+----------+ +---------+
|Column1 | Column2 | Column3 | Column4 | | entB |
+-----------------------+----------+----------+ +---------+
| zzzxxxx | NULL | NULL | zzzyyyy | | zzzxxxx |
+------------+----------+----------+----------+ +---------+
| zzzqqqq | NULL | SomeText | NULL | | zzzyyyy |
+------------+----------+----------+----------+ +---------+
| NULL | zzzxxxx | zzzxxx | NULL | | zzzwwww |
+------------+----------+----------+----------+ +---------+
| zzzyyyy | zzzyyyy | zzzwwww | SomeText |
+------------+----------+----------+----------+
where z,y,x = 1-9 and some text can contain any number or letter- Table B can't have null values in ent B column.
i need to find the total of the values that are in Table A that are not in Table B. The column in table A does not contain unique(distinct) value and may be null.
My first attempt was the following and query
$"select count(1) from " +
$"(" +
$" select distinct Column1 from {TableA} where Column1 not in (select entB from {TableB})" +
$" union" +
$" select distinct Column2 from {TableA} where Column2 not in (select entB from {TableB})" +
$" union" +
$" select distinct Column3 from {TableA} where Column3 not in (select entB from {TableB})" +
$" union" +
$" select distinct Column4 from {TableA} where Column4 not in (select entB from {TableB})" +
$") as t"
This was fine until i had to test this on a TableA with ~70000000 rows and ~100000 rows for TableB where this query took way too long to be executed. I am looking for a way to decrease the time.
I read that using Distinct and Union is an easy way to kill the performance, so i was thinking to try something like this
SELECT Column1
FROM TableA a
WHERE NOT EXISTS (SELECT 1 FROM TableB b WHERE a.Column1 = b.entB and a.Column1 is not null )
and Column1 is not null)
get the result, save it in a DataTable, then repeat the same query for the other 3 column and merge the result checking the duplicates in memory.
Do you know if there are better solution?
EDIT: i have edited the table to show better how my data looks like. In the example, i expect as result the value "2" since there are 2 values (SomeText and zzzqqqq) that are not present in TableB
Once we get past all the grumbling about how TableA isn't normalized, this isn't hard to do.
I guess you want a count of the values of your four columns in TableA that don't match TableB. If you want something more complex, with respect, take the time to figure out how to describe it very precisely.
Start with a subquery that gives you the values from TableA to compare. Because we use UNION rather than UNION ALL, we get SELECT DISTINCT for free. (SQL manipulates sets.)
SELECT Column1 AS ent FROM TableA
UNION
SELECT Column2 AS ent FROM TableA
UNION
SELECT Column3 AS ent FROM TableA
UNION
SELECT Column4 AS ent FROM TableA
Then, use the LEFT JOIN .... IS NULL pattern to get the items that don't match.
SELECT COUNT(*) number_of_unmatched_items
FROM ( SELECT Column1 AS ent FROM TableA
UNION
SELECT Column2 AS ent FROM TableA
UNION
SELECT Column3 AS ent FROM TableA
UNION
SELECT Column4 AS ent FROM TableA
) a
LEFT JOIN TableB b ON a.ent = b.entB
WHERE b.entB IS NULL
That WHERE...IS NULL picks up the rows from your subquery that failed the ON condition in the left join.
To make this decently fast, I think you will need separate indexes on each column from TableA that's involved in this, as well as an index on entB in TableB. But you'll need to try it out, and do EXPLAIN if it still doesn't meet your performance needs.
Unless the machine running MySQL is really short on RAM, MySQL should handle this stuff reasonably efficiently.
Pro tip: You already know this. Denormalized tables like TableA can really mess up query performance.
Here is my SQL query below. I want to select values from the column names given as variables. Is there any appropriate way of doing this except using a dynamic query?
SELECT EPV.EmployeeCode, #RateOfEmployee, #RateOfEmployer
FROM [HR_EmployeeProvisions] EPV
One way to do this without using dynamic sql is using CASE statement
But this is ugly
SELECT EPV.EmployeeCode, case #RateOfEmployee when 'RateOfEmployee' then RateOfEmployee
when 'X' then X
..
end , case #RateOfEmployer when 'RateOfEmployer' then RateOfEmployer
when 'Y' then Y
..
end
FROM [HR_EmployeeProvisions] EPV
You have to check all the column's in CASE statement.
You can't parameterize identifiers in Sql server, and I doubt it's possible in any other relational database.
Your best choice is to use dynamic Sql.
Note that dynamic sql is very often a security hazard and you must defend your code from sql injection attacks.
I would probably do something like this:
Declare #Sql nvarchar(500)
Declare numberOfColumns int;
select #numberOfColumns = count(1)
from information_schema.columns
where table_name = 'HR_EmployeeProvisions'
and column_name IN(#RateOfEmployee, #RateOfEmployer)
if #numberOfColumns = 2 begin
Select #Sql = 'SELECT EmployeeCode, '+ QUOTENAME(#RateOfEmployee) +' ,'+ QUOTENAME(#RateOfEmployer) +
'FROM HR_EmployeeProvisions'
exec(#Sql)
end
This way you make sure that the column names actually exists in the table, as well as using QUOTENAME as another layer of safety.
Note: in your presentation layer you should handle the option that the select will not be performed since the column names are invalid.
Have a look at UNPIVOT clause - I'm not sure it is applicable for your case but in some circumstances it can be used to query a value by the column name without dynamic SQL:
create table t1 (
a int,
b int,
c int
);
insert into t1 values
(1, 11, 111),
(2, 22, 222),
(3, 33, 333);
select a, col_name, col_value from t1
unpivot (col_value for col_name in (b, c)) as dt;
Result:
| a | col_name | col_value |
|---|----------|-----------|
| 1 | b | 11 |
| 1 | c | 111 |
| 2 | b | 22 |
| 2 | c | 222 |
| 3 | b | 33 |
| 3 | c | 333 |
(SQL Fiddle)
If you only need a value in a depending on some condition on (dynamically) either b or c, you can build the condition on that. If you need either values in column b or c, you can add ... WHERE col_name = ?. If you need more columns, you'd probably need to filter the column values on the un-pivoted table than pivot it again to get the values back in columns.
I have a table which has records of user's vacation days.
A Sample of that would be:
+---------+-----------+---------+------------+
| country | user_name | user_id | vac_date |
+---------+-----------+---------+------------+
| canada | James | 1111 | 2015-02-13 |
| canada | James | 1111 | 2015-02-17 |
| canada | James | 1111 | 2015-02-18 |
| canada | James | 1111 | 2015-02-10 |
| canada | James | 1111 | 2015-02-11 |
+---------+-----------+---------+------------+
With the above data, the count would be 3 from feb 13th to feb 18th, because 14th and 15th are weekends and the 16th is a holiday here in Canada. So essentially, I am trying to hold and continue the count if the user took the next working day off. I also have a table that has all the holidays which includes the country and the date of the holiday. Sample data for the holiday table would be:
+---------+-------------+-------------+
| country | holidayDesc | holidayDate |
+---------+-------------+-------------+
| canada | Family Day | 2015-02-16 |
+---------+-------------+-------------+
Currently i have a query in SQL that counts the the dates normally, so it only counts whatever is in the vacation table. For example: if a user took march 3rd 2015, march 4th 2015, and march 5th 2015 off, then it will have a count of 3, but for that above table example, it would only have a count of 1 for feb 13th and 2 from feb 17th to feb 18th.
SELECT DISTINCT user_name
,min(vac_date) as startDate
,max(vac_date) as endDate
,datediff(day, min(vac_date), max(vac_date)) as consecutiveCount
FROM (
SELECT user_name
,vac_date
,user_id
,groupDate = DATEADD(DAY, - ROW_NUMBER() OVER (
PARTITION BY user_id ORDER BY vac_date
), vac_date)
FROM mytable
WHERE country = 'canada'
AND vac_date BETWEEN '20150101'
AND '20151231'
) z
GROUP BY user_name
,groupDate
HAVING datediff(day, min(vac_date), max(vac_date)) >= 0
ORDER BY user_name
,min(vac_date);
This is what it currently outputs from the above sample data:
+-----------+------------+------------+------------------+
| user_name | startDate | endDate | consecutiveCount |
+-----------+------------+------------+------------------+
| James | 2015-02-10 | 2015-02-11 | 2 |
| James | 2015-02-13 | 2015-02-13 | 1 |
| James | 2015-02-17 | 2015-02-18 | 2 |
+-----------+------------+------------+------------------+
Ideally i would like it to be:
+-----------+------------+------------+------------------+
| user_name | startDate | endDate | consecutiveCount |
+-----------+------------+------------+------------------+
| James | 2015-02-10 | 2015-02-11 | 2 |
| James | 2015-02-13 | 2015-02-18 | 3 |
+-----------+------------+------------+------------------+
But i don't know if that is possible with pure SQL. I can also try to incorporate it into C#.
If it helps I am also using C# and SQL Server Management Studio. Any help would be appreciated. Thanks in advance
I try to go a different route, but then found the fix for John Cappelletti solution.
First you need to add weekend dates to your holiday table.
Get a list of dates between two dates using a function
Then UNION ALL vacation days with holidays, but add a description field so you can difference between both.
There are some CROSS JOIN so you can have holiday and weekends for each country and user (need testing)
SELECT [country],
[user_name], [user_id], [vac_date], 'vacation' as description
FROM vacations
UNION ALL
SELECT c.[country],
u.[user_name],
u.[user_id],
[holidayDate],
'holiday' as description
FROM holidays
CROSS JOIN (SELECT DISTINCT [country] FROM vacations) c
CROSS JOIN (SELECT DISTINCT [user_name], [user_id] FROM vacations) u
Then the final query is the same as John suggested, but this time you only count vacation days.
WITH joinDates as (
SELECT [country],
[user_name], [user_id], [vac_date], 'vacation' as description
FROM vacations
UNION ALL
SELECT c.[country],
u.[user_name],
u.[user_id],
[holidayDate],
'holiday' as description
FROM holidays
CROSS JOIN (SELECT DISTINCT [country] FROM vacations) c
CROSS JOIN (SELECT DISTINCT [user_name], [user_id] FROM vacations) u
)
Select user_name
,startDate = min(vac_date)
,endDate = max(vac_date)
,consecutiveCount = count(*)
From (
Select *
,Grp = Day(vac_date) - Row_Number() over (Partition By country,user_id
Order by vac_date)
From joinDates S
) A
WHERE description = 'vacation' -- only count vacation days ignore holiday/weekend
Group By user_name, Grp
Having count(*)>1
ORDER BY startDate
SQL DEMO
OUTPUT
RAW OUTPUT
here you can see the data before the group by
This seems like a classic Gaps & Islands with a little twist.
Declare #YourTable table (country varchar(25),user_name varchar(25),user_id varchar(25),vac_date date)
Insert Into #YourTable values
('canada','James','1111','2015-02-13'),
('canada','James','1111','2015-02-17'),
('canada','James','1111','2015-02-18'),
('canada','James','1111','2015-02-10'),
('canada','James','1111','2015-02-11')
Declare #Holiday table (country varchar(25),holidayDate date)
Insert Into #Holiday values
('canada','2015-02-16')
Select user_name
,startDate = min(vac_date)
,endDate = max(vac_date)
,consecutiveCount = sum(DayCnt)
From (
Select *
,Grp = Day(vac_date) - Row_Number() over (Partition By country,user_id Order by vac_date)
From (Select Country,user_name,user_id,vac_date,DayCnt=1 from #YourTable
Union All
Select A.Country,user_name,user_id,vac_date=b.holidayDate,DayCnt=1
From #YourTable A
Join #Holiday B on A.country=B.country and abs(DateDiff(DD,vac_date,holidayDate))=1
Union All
Select A.Country,user_name,user_id,vac_date=b.retval,DayCnt=0
From #YourTable A
Join (
Select * From [dbo].[udf-Range-Date]('2015-01-01','2017-12-31','DD',1) where DateName(WEEKDAY,RetVal) in ('Saturday','Sunday')
) B on abs(DateDiff(DD,vac_date,RetVal))=1
) S
) A
Group By user_name,Grp
Having Sum(DayCnt)>1
Returns
user_name startDate endDate consecutiveCount
James 2015-02-10 2015-02-11 2
James 2015-02-16 2015-02-18 3
The UDF to generate dynamic Date Ranges -- could be your own query
CREATE FUNCTION [dbo].[udf-Range-Date] (#R1 datetime,#R2 datetime,#Part varchar(10),#Incr int)
Returns Table
Return (
with cte0(M) As (Select 1+Case #Part When 'YY' then DateDiff(YY,#R1,#R2)/#Incr When 'QQ' then DateDiff(QQ,#R1,#R2)/#Incr When 'MM' then DateDiff(MM,#R1,#R2)/#Incr When 'WK' then DateDiff(WK,#R1,#R2)/#Incr When 'DD' then DateDiff(DD,#R1,#R2)/#Incr When 'HH' then DateDiff(HH,#R1,#R2)/#Incr When 'MI' then DateDiff(MI,#R1,#R2)/#Incr When 'SS' then DateDiff(SS,#R1,#R2)/#Incr End),
cte1(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cte2(N) As (Select Top (Select M from cte0) Row_Number() over (Order By (Select NULL)) From cte1 a, cte1 b, cte1 c, cte1 d, cte1 e, cte1 f, cte1 g, cte1 h ),
cte3(N,D) As (Select 0,#R1 Union All Select N,Case #Part When 'YY' then DateAdd(YY, N*#Incr, #R1) When 'QQ' then DateAdd(QQ, N*#Incr, #R1) When 'MM' then DateAdd(MM, N*#Incr, #R1) When 'WK' then DateAdd(WK, N*#Incr, #R1) When 'DD' then DateAdd(DD, N*#Incr, #R1) When 'HH' then DateAdd(HH, N*#Incr, #R1) When 'MI' then DateAdd(MI, N*#Incr, #R1) When 'SS' then DateAdd(SS, N*#Incr, #R1) End From cte2 )
Select RetSeq = N+1
,RetVal = D
From cte3,cte0
Where D<=#R2
)
/*
Max 100 million observations -- Date Parts YY QQ MM WK DD HH MI SS
Syntax:
Select * from [dbo].[udf-Range-Date]('2016-10-01','2020-10-01','YY',1)
Select * from [dbo].[udf-Range-Date]('2016-01-01','2017-01-01','MM',1)
*/
OK, my understanding of the question is that what you want to do is count spans of days off as only one day. Many businesses call this an "occurrence of absence" to differentiate absences by cause. In this case, you're trying to treat holidays as a continuance of the holiday (for time purposes) and if a holiday occurs on a Friday but the person takes Monday off, that should be one contiguous time out.
Personally, I'd do this in C# because of properties of the DateTime object that could make this a lot easier than trying to make a frankenquery. The code below assumes that you have an object called an Employee that contains its own record of DateTimes, like so:
public class Employee
{
public int ID {get;set;}
public string Name {get;set;}
public List<DateTime> DaysIWasOut {get;set;}
}
public static int TimeOut(IEnumerable employees)
{
int totalOutInstances = 0;
DataTable dt = HolidaysPlease(); //this refers to another method
//to fill the table. Just a basic SQLAdapter.Fill kind of thing.
//Basic so I won't waste time on it here.
foreach(var e in employees)
{
var holidays = dt.AsEnumerable().Where(t => Convert.ToDateTime(t[3]) == d) //holidays now has all of the holidays the employee had off.
totalOutInstances = e.DaysIWasOut.Count();
foreach(var d in e.DaysIWasOut)
{
int daystolook = 0;
if (d.DayOfWeek == DayOfWeek.Friday)
daystolook +=3;
else
daystolook +=1;
if(e.DaysIWasOut.Contains(d.AddDays(daystolook))
{totalOutInstances --; } //don't count that day
}
}
return totalOutInstances;
}
Okay so say I have something like this:
ID | Name | Address
1 | Bob | 123 Fake Street
1 | Bob | 221 Other Street
done by doing something like:
select p.ID, p.Name a.Address from People p
inner join Addresses a on a.OwnerID = p.ID
Is there any way to turn that into
ID | Name | Address_1 | Address_2 | etc...
1 | Bob | 123 Fake Street | 221 Other street | etc
I've seen things that do comma separated values in one column but I don't want that I want distinct columns. I am querying this using MSSQL and C# I don't know if that changes anything. Also this is a made up scenario that is just similar to what I'm doing so the actual structure of the tables can't be changed.
Anyone have any suggestions?
You can use the PIVOT function to get the result but you will also have to implement using a row_number() so you can convert multiple addresses per person into columns.
If you had a known number of addresses, then you would hard-code the query:
select id, name, address_1, address_2
from
(
select p.id, p.name, a.address,
'Address_'+cast(row_number() over(partition by p.id
order by a.ownerid) as varchar(10)) rn
from people p
inner join addresses a
on p.id = a.ownerid
) d
pivot
(
max(address)
for rn in (address_1, address_2)
) piv;
See SQL Fiddle with Demo.
But if your case, you will have an unknown number of addresses per person so you will want to use dynamic SQL and place it into a stored procedure to execute:
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT distinct ',' + QUOTENAME('Address_'+d.rn)
from
(
select cast(row_number() over(partition by a.ownerid
order by a.ownerid) as varchar(10)) rn
from Addresses a
) d
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT id, name, ' + #cols + '
from
(
select p.id, p.name, a.address,
''Address_''+cast(row_number() over(partition by p.id
order by a.ownerid) as varchar(10)) rn
from people p
inner join addresses a
on p.id = a.ownerid
) d
pivot
(
max(address)
for rn in (' + #cols + ')
) p '
execute(#query);
See SQL Fiddle with Demo. These both give a result:
| ID | NAME | ADDRESS_1 | ADDRESS_2 | ADDRESS_3 |
----------------------------------------------------------------
| 1 | Bob | 123 Fake Street | 221 Other Street | (null) |
| 2 | Jim | 123 e main street | (null) | (null) |
| 3 | Tim | 489 North Drive | 56 June Street | 415 Lost |
Trying to pivot dynamic data using LINQ or LAMBDA in C#/MVC4 and have pretty much come to the conclusion that its very difficult to do..
This is basically what I want to do:
I have been able to get this to work with known column names using this example: http://geekswithblogs.net/malisancube/archive/2009/04/21/using-lambda-or-linq-for-pivot-tables.aspx
But I cant find any examples for doing this with dynamic columns.
By dynamic columns I mean that there could be a new row with a different Name and FieldType that has not been in the table before at any time that also needs to be turned into a column.. any pointers?.
I don't know LINQ so I will give you a version that can be used in a SQL Server stored procedure. This type of data transformation is known as a PIVOT. Since you are using SQL Server 2008+, you can use the function.
If you know the values that you want to transform, then you can hard-code the values:
SELECT nodeid, rowid,[FirstName], [LastName], [Title]
FROM
(
SELECT nodeid, rowid, name, value
FROM yourTable
) x
PIVOT
(
max(value)
for name in ([FirstName], [LastName], [Title])
)p
See SQL Fiddle with Demo
Then if you have an unknown number of values, you can implement dynamic SQL:
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT distinct ',' + QUOTENAME(name)
from yourtable
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT nodeid, rowid,' + #cols + ' from
(
SELECT nodeid, rowid, name, value
FROM yourTable
) x
pivot
(
max(value)
for name in (' + #cols + ')
) p '
execute(#query)
See SQL Fiddle with Demo
Both return the results:
| NODEID | ROWID | FIRSTNAME | LASTNAME | TITLE |
--------------------------------------------------
| 1 | 1 | Alfred | Beagle | (null) |
| 1 | 2 | Freddy | (null) | (null) |
| 1 | 3 | (null) | Grey | Sir. |