So I currently have a database table of about 70,000 names. What I want to do is take 3000 random records from that database and insert them into another table where each name has a row for all the other names. In other words, the new table should look like this:
John, jerry
john, alex
john, sam
jerry, alex
jerry, sam
alex, sam
This means that I should be adding summation n rows to the table. My current strategy is to use two nested for loops to add these rows one at a time and then removing the first name from the list of names to add in order to ensure I dont have a duplicate record with different ordering.
My question is this: is there a faster way to do this, perhaps through parallel for loops or PLINQ or some other option that I a have not mentioned?
Given a table "Names" with an nvarchar(50) column "Name" with this data:
Adam
Bob
Charlie
Den
Eric
Fred
This query:
-- Work out the fraction we need
DECLARE #frac AS float;
SELECT #frac = CAST(35000 AS float) / 70000;
-- Get roughly that sample size
WITH ts AS (
SELECT Name FROM Names
WHERE #frac >= CAST(CHECKSUM(NEWID(), Name) & 0x7FFFFFFF AS float) / CAST (0X7FFFFFFF AS int)
)
-- Match each entry in the sample with all the other entries
SELECT x.Name + ', ' + y.Name
FROM ts AS X
CROSS JOIN
Names AS Y
WHERE x.Name <> y.Name
produces results of the form
Adam, Bob
Adam, Charlie
Adam, Den
Adam, Eric
Adam, Fred
Charlie, Adam
Charlie, Bob
Charlie, Den
Charlie, Eric
Charlie, Fred
Den, Adam
Den, Bob
Den, Charlie
Den, Eric
Den, Fred
The results will vary by run; a sample of 3000 out of 70000 will have approximately 3000 * 70000 result rows. I used 35000./70000 because the sample size I used was only 6.
If you want only the names from the sample used, change CROSS JOIN Names AS Y to CROSS JOIN ts AS Y, and there will then be approximately 3000 * 3000 result rows.
Reference: The random sample method was taken from the section "Important" in Limiting Result Sets by Using TABLESAMPLE.
You will need to figure out the random part
select t1.name, t2.name
from table t1
join table t2
on t1.name < t2.name
order by t1.name, t2.name
You need to materialize the newid
declare #t table (name varchar(10) primary key);
insert into #t (name) values
('Adam')
, ('Bob')
, ('Charlie')
, ('Den')
, ('Eric')
, ('Fred');
declare #top table (name varchar(10) primary key);
insert into #top (name)
select top (4) name from #t order by NEWID();
select * from #top;
select a.name, b.name
from #top a
join #top b
on a.name < b.name
order by a.name, b.name;
Using a Number table to simulate names.
single query, using a triangular join
WITH all_names
AS (SELECT n,
'NAME_' + Cast(n AS VARCHAR(20)) NAME
FROM number
WHERE n < 70000),
rand_names
AS (SELECT TOP 3000 *
FROM all_names
ORDER BY Newid()),
ordered_names
AS (SELECT Row_number()
OVER (
ORDER BY NAME) rw_num,
NAME
FROM rand_names)
SELECT n1.NAME,
n2.NAME
FROM ordered_names n1
INNER JOIN ordered_names n2
ON n2.rw_num > n1.rw_num
Related
I need to make a query that provides me the most common values over columns,
the ones with the most occurrences.
For example:
Name Grade Gender
--------------------------
Jeff 100 Male
Daniel 100 Male
Linda 80 Female
Jeff 90 Male
The query will provide me a datarow with Name - Jeff Grade - 100 Gender - Male
The query I have by far is this:
SELECT
PhonesTBL.OperatingSystem, PhonesTBL.Memory,
PhonesTBL.BatterySize, PhonesTBL.CameraQuality, PhonesTBL.Processor,
PhonesTBL.ScreenSize, PhonesTBL.PhoneType
FROM
PhonesTBL
INNER JOIN
HistoryTBL ON PhonesTBL.PhoneID = HistoryTBL.PhoneID
WHERE
UserID = Uid
GROUP BY
OperatingSystem, Memory, BatterySize, CameraQuality, Processor,
ScreenSize, PhoneType
ORDER BY
COUNT(*) DESC
but it returns just distinct of values I want.
Looking for help , Ohad
Your question doesn't really make sense. You have sample data that has nothing to do with your query. Let me use the sample data.
In MS Access, this is most simply done by putting the mode (the statistical name for what you want) in separate rows:
(select top (1) "name" as which, name
from t
group by name
order by count(*) desc, name
) union all
(select top (1) "grade" as which, grade
from t
group by grade
order by count(*) desc, grade
) union all
(select top (1) "gender" as which, gender
from t
group by gender
order by count(*) desc, gender
);
Buddy
i have one query of MSSQL.
that is like this..
SELECT DISTINCT resource.locationurl,
resource.resourcename,
resource.anwserid,
checktotal.total
FROM resource
INNER JOIN (SELECT Count(DISTINCT anwserid) AS total,
resourcename
FROM resource AS Resource_1
WHERE ( anwserid IN (SELECT Cast(value AS INT) AS Expr1
FROM dbo.Udf_split(#sCategoryID, ',')
AS
udf_Split_1) )
GROUP BY resourcename) AS checktotal
ON resource.resourcename = checktotal.resourcename
WHERE ( resource.anwserid IN (SELECT Cast(value AS INT) AS Expr1
FROM dbo.Udf_split(#sCategoryID, ',') AS
udf_Split_1)
)
AND ( checktotal.total = #Total )
ORDER BY resource.resourcename
I run this query but its give me repeated column of Resource.LocationURL.
you can check it live hear http://www.ite.org/visionzero/toolbox/default2.aspx
check in above link where you can fire select some category but result was not distinct..
i try most of my but now i am out of mind please help me with this.
You misunderstand what DISTINCT means when you are fetching more than one column.
If you run this query:
SELECT DISTINCT col1, col2 FROM table
You are selected every different combination. An acceptable result would be
value 1_1, value 2_1
value 1_1, value 2_2,
value 2_1, value_2_1
In this example, value 1_1 appears twice, but the two columns combined are unique.
My guess is that you are actually attempting to perform a grouping:
SELECT resource.locationurl,
resource.resourcename,
resource.anwserid,
Sum(checktotal.total)
FROM resource
INNER JOIN (SELECT Count(DISTINCT anwserid) AS total,
resourcename
FROM resource AS Resource_1
WHERE ( anwserid IN (SELECT Cast(value AS INT) AS Expr1
FROM dbo.Udf_split(#sCategoryID, ',')
AS
udf_Split_1) )
GROUP BY resourcename) AS checktotal
ON resource.resourcename = checktotal.resourcename
WHERE ( resource.anwserid IN (SELECT Cast(value AS INT) AS Expr1
FROM dbo.Udf_split(#sCategoryID, ',') AS
udf_Split_1)
)
AND ( checktotal.total = #Total )
GROUP BY resource.locationurl,
resource.resourcename,
resource.anwserid
First of all, the site you linked doesn't do anything.
Second, DISTINCTensures unique rows. It will not make the values in all the columns unique as well. Just think about it! How would it work? You have two rows with the same locationurl field, but with otherwise distinct elements. Which one do you not include?
Lastly, please take greater care in phrasing your questions.
as I see your query is select DISTINCT on multi columns,
so if any record has at least one col difference then it pass the DISTINCT condition
Ex:
record1 : locationurl | resourcename | anwserid | Sum(checktotal.total)
loc1 res1 1 100
record2 : locationurl | resourcename | anwserid | Sum(checktotal.total)
loc1 res1 2 100
Finding a solution to an issue in my project
I have stages associated with contracts. That is, a contract can be in either Active stage, Process stage or Terminated stage.
I need to get the no the days the contract was in each stage.
For example, if a contract C1 was in Active stage from 20/10/2013 to 22/10/2013, then in the Process stage from 22/10/2013 to 25/10/2013 and finally in Terminated stage from 25/10/2013 to 26/10/2013 and then again in Active from 26/10/2013 to 28/10/2013, then I should get as result
Active = 4days
Process = 3days
Terminated = 1day /likewise something
My table is created with these columns:
EntryId (primary key)
StageId (foreign key to Stage table)
ContractId (foreign key to contract table)
DateofStageChange
How to do this in SQL Server?
As asked pls find the table entries:
EntryID | Stage ID | Contract ID | DateChange
1 | A1 | C1 |20/10/2013
2 | P1 | C1 |22/10/2013
3 | T1 | C1 |25/10/2013
4 | A1 | C1 |26/10/2013
5 | P1 | C1 |28/10/2013
6 | T1 | C1 |Null(currently in this stage)
Need to use group by on Stage ID
it is important to check and make sure how data is populated in your table.Based on just your sample data and also note that if your entryid is not in sequence then you can create one sequence using row_number.
declare #t table(EntryId int identity(1,1), StageId int,ContractId varchar(10),DateofStageChange date)
insert into #t values
(1,'C1','2013-10-20'),(1,'C1','2013-10-22'),(2,'C1','2013-10-22'),(2,'C1','2013-10-25')
,(3,'C1','2013-10-25'),(3,'C1','2013-10-26'),(1,'C1','2013-10-26'),(1,'C1','2013-10-28')
Select StageId,sum([noOfDays]) [totalNofDays] from
(select a.StageId,a.ContractId,a.DateofStageChange [Fromdate],b.DateofStageChange [ToDate]
,datediff(day,a.DateofStageChange,b.DateofStageChange) [noOfDays]
from #t a
inner join #t b on a.StageId=b.StageId and b.EntryId-a.EntryId=1)t4
group by StageId
You can't with your current structure.
You can get the latest one by doing datediff(d, getdate(), DateOfStageChange)
but you don't have any history so you can't get previous status
This can be done in SQL with CTE.
You didnt provide your tablenames, so you'll need to change where I've indicated below, but it would look like this:
;WITH cte
AS (
SELECT
DateofStageChange, StageID, ContractID,
ROW_NUMBER() OVER (ORDER BY ContractID, StageId, DateofStageChange) AS RowNum
FROM
DateOfStageChangeTable //<==== Change this table name
)
SELECT
a.ContractId,
a.StageId,
Coalesce(sum(DATEDIFF(d ,b.DateofStageChange,a.DateofStageChange)), 'CurrentState`) as Days
FROM
cte AS A
LEFT OUTER JOIN
cte AS B
ON A.RowNum = B.RowNum + 1 and a.StageId = b.StageId and a.ContractId = b.ContractId
group by a.StageId, a.ContractId
This really is just a self join that creates a row number on a table, orders the table by StageID and date and then joins to itself. The first date on the first row of the stage id and date, joins to the second date on the second row, then the daterange is calculated in days.
This assumes that you only have 2 dates for each stage, if you have several, you would just need to do a min and max on the cte table.
EDIT:
Based on your sample data, the above query should work well. Let me know if you get any syntax errors and I'll fix them.
I added a coalesce to indicate the state they are currently in.
I created a table with multiple inner joins from 4 tables but the results brings back duplicate records. Here code that I am using
SELECT tblLoadStop.LoadID,
tblCustomer.CustomerID,
tblLoadMaster.BillingID,
tblLoadMaster.LoadID,
tblLoadMaster.PayBetween1,
LoadStopID,
tblLoadMaster.Paybetween2,
tblStopLocation.StopLocationID,
tblStopLocation.city,
tblStopLocation.state,
tblStopLocation.zipcode,
tblLoadSpecifications.LoadID,
tblLoadSpecifications.LoadSpecificationID,
Picks,
Stops,
Typeofshipment,
Weight,
LoadSpecClass,
Miles,
CommodityList,
OriginationCity,
OriginationState,
DestinationCity,
DestinationState,
LoadRate,
Status,
CompanyName,
Customerflag,
tblCustomer.CustomerID,
tblCustomer.AddressLine1,
tblCustomer.City,
tblCustomer.State,
tblCustomer.Zipcode,
CompanyPhoneNumber,
CompanyFaxNumber,
SCAC,
tblLoadMaster.Salesperson,
Change,
StopType
FROM tblLoadMaster
INNER JOIN tblLoadSpecifications
ON tblLoadSpecifications.LoadID = tblLoadMaster.LoadID
INNER JOIN tblLoadStop
ON tblLoadStop.LoadID = tblLoadMaster.LoadID
INNER JOIN tblStopLocation
ON tblStopLocation.StopLocationID = tblLoadStop.StopLocationID
INNER JOIN tblCustomer
ON tblCustomer.CustomerID = tblLoadMaster.CustomerID
WHERE tblLoadMaster.Phase LIKE '%2%'
ORDER BY tblLoadMaster.LoadID DESC;
This is the result that I get
Load ID Customer Salesperson Origin Destination Rate
-------------------------------------------------------------------------
13356 FedEx Alex Duluth New York 300
13356 FedEx Steve Florida Kansas 400
I only want the first row to show,
13356 FedEx Alex Duluth New York 300
and remove the bottom row,
13356 FedEx Steve Florida Kansas 400
The tblLoadStop Table has the duplicate record with a duplicate LoadID from tblloadMaster Table
One approach would be to use a CTE (Common Table Expression) if you're on SQL Server 2005 and newer (you aren't specific enough in that regard).
With this CTE, you can partition your data by some criteria - i.e. your LoadID - and have SQL Server number all your rows starting at 1 for each of those "partitions", ordered by some criteria (you're not very clear on how you decide which row to keep and which to ignore in your question).
So try something like this:
;WITH CTE AS
(
SELECT
LoadID, Customer, Salesperson, Origin, Destination, Rate,
RowNum = ROW_NUMBER() OVER(PARTITION BY LoadID ORDER BY tblLoadstopID ASC)
FROM
dbo.tblLoadMaster lm
......
WHERE
lm.Phase LIKE '%2%'
)
SELECT
LoadID, Customer, Salesperson, Origin, Destination, Rate
FROM
CTE
WHERE
RowNum = 1
Here, I am selecting only the "first" entry for each "partition" (i.e. for each LoadId) - ordered by some criteria (updated: order by tblLoadstopID - as you mentioned) you need to define in your CTE.
Does that approach what you're looking for??
In essence I want to pick the best match of a prefix from the "Rate" table based on the TelephoneNumber field in the "Call" table. Given the example data below, '0123456789' would best match the prefix '012' whilst '0100000000' would best match the prefix '01'.
I've included some DML with some more examples of correct matches in the SQL comments.
There will be circa 70,000 rows in the rate table and the call table will have around 20 million rows. But there will be a restriction on the Select from the Call table based on a dateTime column so actually the query will only need to run over 0.5 million call rows.
The prefix in the Rate table can be up to 16 characters long.
I have no idea how to approach this in SQL, I'm currently thinking of writing a C# SQLCLR function to do it. Has anyone done anything similar? I'd appreciate any advice you have.
Example Data
Call table:
Id TelephoneNumber
1 0123456789
2 0100000000
3 0200000000
4 0780000000
5 0784000000
6 0987654321
Rate table:
Prefix Scale
1
01 1.1
012 1.2
02 2
078 3
0784 3.1
DML
create table Rate
(
Prefix nvarchar(16) not null,
Scale float not null
)
create table [Call]
(
Id bigint not null,
TelephoneNumber nvarchar(16) not null
)
insert into Rate (Prefix, Scale) values ('', 1)
insert into Rate (Prefix, Scale) values ('01', 1.1)
insert into Rate (Prefix, Scale) values ('012', 1.2)
insert into Rate (Prefix, Scale) values ('02', 2)
insert into Rate (Prefix, Scale) values ('078', 3)
insert into Rate (Prefix, Scale) values ('0784', 3.1)
insert into [Call] (Id, TelephoneNumber) values (1, '0123456789') --match 1.2
insert into [Call] (Id, TelephoneNumber) values (2, '0100000000') --match 1.1
insert into [Call] (Id, TelephoneNumber) values (3, '0200000000') --match 2
insert into [Call] (Id, TelephoneNumber) values (4, '0780000000') --match 3
insert into [Call] (Id, TelephoneNumber) values (5, '0784000000') --match 3.1
insert into [Call] (Id, TelephoneNumber) values (6, '0987654321') --match 1
Note: The last one '0987654321' matches the blank string because there are no better matches.
Since this is based on partial matching, a subselect would be the only viable option (unless, like LukeH assumes, every call is unique)
select
c.Id,
c.TelephoneNumber,
(select top 1
Scale
from Rate r
where c.TelephoneNumber like r.Prefix + '%' order by Scale desc
) as Scale
from Call c
SELECT t.Id, t.TelephoneNumber, t.Prefix, t.Scale
FROM
(
SELECT *, ROW_NUMBER() OVER
(
PARTITION BY c.TelephoneNumber
ORDER BY r.Scale DESC
) AS RowNumber
FROM [call] AS c
INNER JOIN [rate] AS r
ON c.TelephoneNumber LIKE r.Prefix + '%'
) AS t
WHERE t.RowNumber = 1
ORDER BY t.Id
Try this one:
select Prefix, min(c.TelephoneNumber)
from Rate r
left outer join Call c on c.TelephoneNumber like left(Prefix + '0000000000', 10)
or c.TelephoneNumber like Prefix + '%'
group by Prefix
You can use a left join to try to find a "better" match, and then eliminate such matches in your where clause. e.g.:
select
*
from
Call c
inner join
Rate r
on
r.Prefix = SUBSTRING(c.TelephoneNumber,1,LEN(r.Prefix))
left join
Rate r_anti
on
r_anti.Prefix = SUBSTRING(c.TelephoneNumber,1,LEN(r_anti.Prefix)) and
LEN(r_anti.Prefix) > LEN(r.Prefix)
where
r_anti.Prefix is null