Removing Duplicate row from a table based on one column

Removing Duplicate row from a table based on one column - c#

I created a table with multiple inner joins from 4 tables but the results brings back duplicate records. Here code that I am using
SELECT tblLoadStop.LoadID,
tblCustomer.CustomerID,
tblLoadMaster.BillingID,
tblLoadMaster.LoadID,
tblLoadMaster.PayBetween1,
LoadStopID,
tblLoadMaster.Paybetween2,
tblStopLocation.StopLocationID,
tblStopLocation.city,
tblStopLocation.state,
tblStopLocation.zipcode,
tblLoadSpecifications.LoadID,
tblLoadSpecifications.LoadSpecificationID,
Picks,
Stops,
Typeofshipment,
Weight,
LoadSpecClass,
Miles,
CommodityList,
OriginationCity,
OriginationState,
DestinationCity,
DestinationState,
LoadRate,
Status,
CompanyName,
Customerflag,
tblCustomer.CustomerID,
tblCustomer.AddressLine1,
tblCustomer.City,
tblCustomer.State,
tblCustomer.Zipcode,
CompanyPhoneNumber,
CompanyFaxNumber,
SCAC,
tblLoadMaster.Salesperson,
Change,
StopType
FROM tblLoadMaster
INNER JOIN tblLoadSpecifications
ON tblLoadSpecifications.LoadID = tblLoadMaster.LoadID
INNER JOIN tblLoadStop
ON tblLoadStop.LoadID = tblLoadMaster.LoadID
INNER JOIN tblStopLocation
ON tblStopLocation.StopLocationID = tblLoadStop.StopLocationID
INNER JOIN tblCustomer
ON tblCustomer.CustomerID = tblLoadMaster.CustomerID
WHERE tblLoadMaster.Phase LIKE '%2%'
ORDER BY tblLoadMaster.LoadID DESC;
This is the result that I get
Load ID Customer Salesperson Origin Destination Rate
-------------------------------------------------------------------------
13356 FedEx Alex Duluth New York 300
13356 FedEx Steve Florida Kansas 400
I only want the first row to show,
13356 FedEx Alex Duluth New York 300
and remove the bottom row,
13356 FedEx Steve Florida Kansas 400
The tblLoadStop Table has the duplicate record with a duplicate LoadID from tblloadMaster Table

One approach would be to use a CTE (Common Table Expression) if you're on SQL Server 2005 and newer (you aren't specific enough in that regard).
With this CTE, you can partition your data by some criteria - i.e. your LoadID - and have SQL Server number all your rows starting at 1 for each of those "partitions", ordered by some criteria (you're not very clear on how you decide which row to keep and which to ignore in your question).
So try something like this:
;WITH CTE AS
(
SELECT
LoadID, Customer, Salesperson, Origin, Destination, Rate,
RowNum = ROW_NUMBER() OVER(PARTITION BY LoadID ORDER BY tblLoadstopID ASC)
FROM
dbo.tblLoadMaster lm
......
WHERE
lm.Phase LIKE '%2%'
)
SELECT
LoadID, Customer, Salesperson, Origin, Destination, Rate
FROM
CTE
WHERE
RowNum = 1
Here, I am selecting only the "first" entry for each "partition" (i.e. for each LoadId) - ordered by some criteria (updated: order by tblLoadstopID - as you mentioned) you need to define in your CTE.
Does that approach what you're looking for??

Related

Find values based on one column table from another table.Pivot

Help needed.
I have three tables
Building_master - columns: BuildingCode, Building Name
Floor_master - columns: FloorCode, Floor Name, BuildingCode
Room_Master - columns: RoomCode, RoomName, RoomFloor, RoomBulding
I want to fill the GridView when I select Building Name from Building_master table where the output will be something like below
Building Name: A
Floor
1 Room 101 Room 102 Room 103 Room 104
2 Room 201 Room 202 Room 203
3 Room 301 Room 302 Room 303 Room 304
Kindly help to create a SQL query for the desired output

To pivot over a fixed number of columns (that is, the maximum number of rooms per floor), you can join, then use window functions and conditional aggregation:
select
building_name,
floor_name,
max(case when rn = 1 then room_name end) room1,
max(case when rn = 2 then room_name end) room2,
max(case when rn = 3 then room_name end) room3
from (
select
b.building_code,
b.building_name,
f.floor_code,
f.floor_name,
r.room_name,
row_number() over(
partition by b.building_code, f.floor_code order by r.room_code
) rn
from building_master b
inner join floor_master f
on f.building_code = b.building_code
inner join room_master r
on r.room_floor = f.floor_code
and r.room_building = b.building_code
) t
group by b.building_code, b.building_name, f.floor_code, f.floor_name
I had to make a few guesses about the relationships in your schema, that you might need to review.
You can handle more rooms by floor by adding more max() expressions to the outer select.

Faster way to insert records into database

So I currently have a database table of about 70,000 names. What I want to do is take 3000 random records from that database and insert them into another table where each name has a row for all the other names. In other words, the new table should look like this:
John, jerry
john, alex
john, sam
jerry, alex
jerry, sam
alex, sam
This means that I should be adding summation n rows to the table. My current strategy is to use two nested for loops to add these rows one at a time and then removing the first name from the list of names to add in order to ensure I dont have a duplicate record with different ordering.
My question is this: is there a faster way to do this, perhaps through parallel for loops or PLINQ or some other option that I a have not mentioned?

Given a table "Names" with an nvarchar(50) column "Name" with this data:
Adam
Bob
Charlie
Den
Eric
Fred
This query:
-- Work out the fraction we need
DECLARE #frac AS float;
SELECT #frac = CAST(35000 AS float) / 70000;
-- Get roughly that sample size
WITH ts AS (
SELECT Name FROM Names
WHERE #frac >= CAST(CHECKSUM(NEWID(), Name) & 0x7FFFFFFF AS float) / CAST (0X7FFFFFFF AS int)
)
-- Match each entry in the sample with all the other entries
SELECT x.Name + ', ' + y.Name
FROM ts AS X
CROSS JOIN
Names AS Y
WHERE x.Name <> y.Name
produces results of the form
Adam, Bob
Adam, Charlie
Adam, Den
Adam, Eric
Adam, Fred
Charlie, Adam
Charlie, Bob
Charlie, Den
Charlie, Eric
Charlie, Fred
Den, Adam
Den, Bob
Den, Charlie
Den, Eric
Den, Fred
The results will vary by run; a sample of 3000 out of 70000 will have approximately 3000 * 70000 result rows. I used 35000./70000 because the sample size I used was only 6.
If you want only the names from the sample used, change CROSS JOIN Names AS Y to CROSS JOIN ts AS Y, and there will then be approximately 3000 * 3000 result rows.
Reference: The random sample method was taken from the section "Important" in Limiting Result Sets by Using TABLESAMPLE.

You will need to figure out the random part
select t1.name, t2.name
from table t1
join table t2
on t1.name < t2.name
order by t1.name, t2.name
You need to materialize the newid
declare #t table (name varchar(10) primary key);
insert into #t (name) values
('Adam')
, ('Bob')
, ('Charlie')
, ('Den')
, ('Eric')
, ('Fred');
declare #top table (name varchar(10) primary key);
insert into #top (name)
select top (4) name from #t order by NEWID();
select * from #top;
select a.name, b.name
from #top a
join #top b
on a.name < b.name
order by a.name, b.name;

Using a Number table to simulate names.
single query, using a triangular join
WITH all_names
AS (SELECT n,
'NAME_' + Cast(n AS VARCHAR(20)) NAME
FROM number
WHERE n < 70000),
rand_names
AS (SELECT TOP 3000 *
FROM all_names
ORDER BY Newid()),
ordered_names
AS (SELECT Row_number()
OVER (
ORDER BY NAME) rw_num,
NAME
FROM rand_names)
SELECT n1.NAME,
n2.NAME
FROM ordered_names n1
INNER JOIN ordered_names n2
ON n2.rw_num > n1.rw_num

How to merge SQL Server data efficiently?

I have three tables in SQL Server where I need to combine all matching rows from all tables into a fourth MergedTable that will contain all the columns from the three individual tables based on the U_ID column.
Is there a way of doing this via T-SQL in a stored procedure, or should I just create a loop function in C#?
Bottom line is this is going to be executed from a command from a website, so it needs to be something I can encapsulate into an MVC project or component.
Here is an example of the tables.
Table 1:
U_ID ClientNumber OrderDate Amount
---------------------------------------------
BB000Kw 1920384 5/14/2013 1093.39
AA000bM 3839484 12/8/2012 584.42
AA000gH 8294848 2/28/2014 4849.38
AA000md 3849484 4/31/2013 590.84
AA000mF 3998398 3/29/2013 448.82
AA000mG 9944848 11/28/2014 98.85
AA000mn 0292938 10/31/2012 300.48
Table 2:
U_ID Name Date
------------------------------------------
AA000bM "Krivis, Jeffrey" 7/1/2002
AA000bv "Saydah, Michael" 7/30/2002
AA000cA "Byrne, Richard" 4/21/2003
AA000dd "McNeil, Joseph" 6/10/2003
AA000dH "Greenberg, Arnold" 1/16/2003
AA000gH "Rich, Elwood" 7/5/2003
AA000id "O'Neill, Robert J." 11/20/2002
AA000jf "Patsey, Richard" 4/22/2003
AA000jr "Jones, Arthur" 7/1/2002
AA000jU "Toff, Ronald" 7/15/2002
AA000k4 "Anderson, Carl" 8/14/2002
BB000Kw "Wilson, Sam" 3/9/2003
Table 3:
U_ID Name
-----------------------------
AA000bM Acme Company
AA000jr Stockwell Industries
BB000ke Gensen Motors
BB999di Falstaff Cards
BB000dl B and R Printing
BB000Kw Go Golf Carts
AA000gH Rich's Sandwiches
Resulting merged table
U_ID ClientNumber OrderDate Amount CustomerName JoinDate CompanyName
-------------------------------------------------------------------------------------------------------
BB000Kw 1920384 5/14/2013 1093.39 "Wilson, Sam" 3/9/2003 Go Golf Carts
AA000bM 3839484 12/8/2012 584.42 "Krivis, Jeffrey" 7/1/2002 Acme Company
AA000gH 8294848 2/28/2014 4849.38 "Rich, Elwood" 7/5/2003 Rich's Sandwiches
Table 1 is the master table that the others are matched to. You can see from the result that there will be only a subset of all the tables based on those that are matched from Table 1.
I'll be using MVC with the Entity Framework 6 and Linq-to-Entities, but if a T-SQL script is more efficient, then I should probably use that instead.
Which is the better way to go to get this result?

If you want to create a new table you can use SELECT ... INTO ... FROM ... query. In your case it would look like this:
SELECT t1.U_ID, t1.ClientNumber, t1.OrderDate, t1.Amount,
t2.Name as CustomerName, t2.Date as JoinDate,
t3.Name as CompanyName
INTO dbo.ResultingMergedTable
FROM Table1 t1
INNER JOIN Table2 t2 ON t1.U_ID = t2.U_ID
INNER JOIN Table3 t3 ON t1.U_ID = t3.U_ID
Keep in mind that if you are looking at really big data table this will take a lot of time to execute.

You can create a 4th table to do what you mentioned but if you are using sql you can create a view to do the same thing. A view is a virtual table. We use this when we partition data as well as make a detailed record like described above.
http://msdn.microsoft.com/en-us/library/ms187956.aspx
http://www.sqlinfo.net/sqlserver/sql_server_VIEWS_the_basics.php
CREATE VIEW DetailView AS
(
SELECT
-- table1
t1.U_ID,
t1.ClientNumber,
t1.OrderDate,
t1.Amount,
-- table2
t2.Name,
t2.Date as [JoinDate],
-- table3
t3.Name as [Company]
FROM
table1 t1
LEFT JOIN
table2 t2
ON t1.U_ID = t2.U_ID
LEFT JOIN
table3 t3
ON t1.U_ID = t3.U_ID
WHERE
t1.U_ID = t2.U_ID
and
t1.U_ID = t3.U_ID
)

Stored Procedure for date ranges in a single column

Finding a solution to an issue in my project
I have stages associated with contracts. That is, a contract can be in either Active stage, Process stage or Terminated stage.
I need to get the no the days the contract was in each stage.
For example, if a contract C1 was in Active stage from 20/10/2013 to 22/10/2013, then in the Process stage from 22/10/2013 to 25/10/2013 and finally in Terminated stage from 25/10/2013 to 26/10/2013 and then again in Active from 26/10/2013 to 28/10/2013, then I should get as result
Active = 4days
Process = 3days
Terminated = 1day /likewise something
My table is created with these columns:
EntryId (primary key)
StageId (foreign key to Stage table)
ContractId (foreign key to contract table)
DateofStageChange
How to do this in SQL Server?
As asked pls find the table entries:
EntryID | Stage ID | Contract ID | DateChange
1 | A1 | C1 |20/10/2013
2 | P1 | C1 |22/10/2013
3 | T1 | C1 |25/10/2013
4 | A1 | C1 |26/10/2013
5 | P1 | C1 |28/10/2013
6 | T1 | C1 |Null(currently in this stage)
Need to use group by on Stage ID

it is important to check and make sure how data is populated in your table.Based on just your sample data and also note that if your entryid is not in sequence then you can create one sequence using row_number.
declare #t table(EntryId int identity(1,1), StageId int,ContractId varchar(10),DateofStageChange date)
insert into #t values
(1,'C1','2013-10-20'),(1,'C1','2013-10-22'),(2,'C1','2013-10-22'),(2,'C1','2013-10-25')
,(3,'C1','2013-10-25'),(3,'C1','2013-10-26'),(1,'C1','2013-10-26'),(1,'C1','2013-10-28')
Select StageId,sum([noOfDays]) [totalNofDays] from
(select a.StageId,a.ContractId,a.DateofStageChange [Fromdate],b.DateofStageChange [ToDate]
,datediff(day,a.DateofStageChange,b.DateofStageChange) [noOfDays]
from #t a
inner join #t b on a.StageId=b.StageId and b.EntryId-a.EntryId=1)t4
group by StageId

You can't with your current structure.
You can get the latest one by doing datediff(d, getdate(), DateOfStageChange)
but you don't have any history so you can't get previous status

This can be done in SQL with CTE.
You didnt provide your tablenames, so you'll need to change where I've indicated below, but it would look like this:
;WITH cte
AS (
SELECT
DateofStageChange, StageID, ContractID,
ROW_NUMBER() OVER (ORDER BY ContractID, StageId, DateofStageChange) AS RowNum
FROM
DateOfStageChangeTable //<==== Change this table name
)
SELECT
a.ContractId,
a.StageId,
Coalesce(sum(DATEDIFF(d ,b.DateofStageChange,a.DateofStageChange)), 'CurrentState`) as Days
FROM
cte AS A
LEFT OUTER JOIN
cte AS B
ON A.RowNum = B.RowNum + 1 and a.StageId = b.StageId and a.ContractId = b.ContractId
group by a.StageId, a.ContractId
This really is just a self join that creates a row number on a table, orders the table by StageID and date and then joins to itself. The first date on the first row of the stage id and date, joins to the second date on the second row, then the daterange is calculated in days.
This assumes that you only have 2 dates for each stage, if you have several, you would just need to do a min and max on the cte table.
EDIT:
Based on your sample data, the above query should work well. Let me know if you get any syntax errors and I'll fix them.
I added a coalesce to indicate the state they are currently in.

What is a good approach in MS SQL Server 2008 to join on a "best" match?

In essence I want to pick the best match of a prefix from the "Rate" table based on the TelephoneNumber field in the "Call" table. Given the example data below, '0123456789' would best match the prefix '012' whilst '0100000000' would best match the prefix '01'.
I've included some DML with some more examples of correct matches in the SQL comments.
There will be circa 70,000 rows in the rate table and the call table will have around 20 million rows. But there will be a restriction on the Select from the Call table based on a dateTime column so actually the query will only need to run over 0.5 million call rows.
The prefix in the Rate table can be up to 16 characters long.
I have no idea how to approach this in SQL, I'm currently thinking of writing a C# SQLCLR function to do it. Has anyone done anything similar? I'd appreciate any advice you have.
Example Data
Call table:
Id TelephoneNumber
1 0123456789
2 0100000000
3 0200000000
4 0780000000
5 0784000000
6 0987654321
Rate table:
Prefix Scale
1
01 1.1
012 1.2
02 2
078 3
0784 3.1
DML
create table Rate
(
Prefix nvarchar(16) not null,
Scale float not null
)
create table [Call]
(
Id bigint not null,
TelephoneNumber nvarchar(16) not null
)
insert into Rate (Prefix, Scale) values ('', 1)
insert into Rate (Prefix, Scale) values ('01', 1.1)
insert into Rate (Prefix, Scale) values ('012', 1.2)
insert into Rate (Prefix, Scale) values ('02', 2)
insert into Rate (Prefix, Scale) values ('078', 3)
insert into Rate (Prefix, Scale) values ('0784', 3.1)
insert into [Call] (Id, TelephoneNumber) values (1, '0123456789') --match 1.2
insert into [Call] (Id, TelephoneNumber) values (2, '0100000000') --match 1.1
insert into [Call] (Id, TelephoneNumber) values (3, '0200000000') --match 2
insert into [Call] (Id, TelephoneNumber) values (4, '0780000000') --match 3
insert into [Call] (Id, TelephoneNumber) values (5, '0784000000') --match 3.1
insert into [Call] (Id, TelephoneNumber) values (6, '0987654321') --match 1
Note: The last one '0987654321' matches the blank string because there are no better matches.

Since this is based on partial matching, a subselect would be the only viable option (unless, like LukeH assumes, every call is unique)
select
c.Id,
c.TelephoneNumber,
(select top 1
Scale
from Rate r
where c.TelephoneNumber like r.Prefix + '%' order by Scale desc
) as Scale
from Call c

SELECT t.Id, t.TelephoneNumber, t.Prefix, t.Scale
FROM
(
SELECT *, ROW_NUMBER() OVER
(
PARTITION BY c.TelephoneNumber
ORDER BY r.Scale DESC
) AS RowNumber
FROM [call] AS c
INNER JOIN [rate] AS r
ON c.TelephoneNumber LIKE r.Prefix + '%'
) AS t
WHERE t.RowNumber = 1
ORDER BY t.Id

Try this one:
select Prefix, min(c.TelephoneNumber)
from Rate r
left outer join Call c on c.TelephoneNumber like left(Prefix + '0000000000', 10)
or c.TelephoneNumber like Prefix + '%'
group by Prefix

You can use a left join to try to find a "better" match, and then eliminate such matches in your where clause. e.g.:
select
*
from
Call c
inner join
Rate r
on
r.Prefix = SUBSTRING(c.TelephoneNumber,1,LEN(r.Prefix))
left join
Rate r_anti
on
r_anti.Prefix = SUBSTRING(c.TelephoneNumber,1,LEN(r_anti.Prefix)) and
LEN(r_anti.Prefix) > LEN(r.Prefix)
where
r_anti.Prefix is null

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Removing Duplicate row from a table based on one column - c#

Related

Find values based on one column table from another table.Pivot

Faster way to insert records into database

How to merge SQL Server data efficiently?

Stored Procedure for date ranges in a single column

What is a good approach in MS SQL Server 2008 to join on a "best" match?

Categories

Resources