SQL Server 2008: re increment table after deletion - c#

using SQL Server 2008, using MS Visual Studio 2012 C# .NET4.5
I asked a similar question last week that was solved with the following query:
DECLARE #from int = 9, #to int = 3
UPDATE MainPareto
SET pareto = m.new_pareto
FROM (
SELECT pKey, -- this is your primary key for the table
new_pareto = row_number()
over(ORDER BY CASE WHEN pareto = #from THEN #to ELSE pareto END,
CASE WHEN pareto = #from THEN 0 ELSE 1 END)
FROM MainPareto
-- put in any conditions that you want to restrict the scores by.
WHERE PG = #pg AND pareto IS NOT NULL
-- end condtions
) as m
INNER JOIN MainPareto ON MainPareto.pKey = m.pKey
WHERE MainPareto.pareto <> m.new_pareto
As you can see this works great, incriments the "league" when changes are made.
Now after some functionality user has requested a deletion and recovery.
On my winform, the user can right click the grid and delete the "part" number.
The user can also recover if needed.
However, I need a Stored procedure that will resort the grid and update like this method does after a deletion from another stored procedure has been made, my Winform will sort that part out, but i do need a procedure that can do what my current one does for a deletion.
Hope you guys understand, if not ask me and ill try and clarify best I can.

I am not totally sure if this is what you are looking for, but this is how you can reseed your Primary Key column (if your primary key is also an identity). Notice how my insert after the truncate does not include Column 1 (the primary key column).
select *
into #temp
from MainPareto
truncate table MainPareto
insert into MainPareto (col2, col3, col4) --...
select col2, col3, col4 --...
from #temp

Related

How to delete data faster from large table in SQL Server?

I have a huge table (log) which keeps some history data. It has more than 10 columns:
Id, Year, Month, Day, data1, data2, data3, ......
Because the table is huge, it has lots of index on it.
The system keeps inserting lots of new data into this table. However, because of the way the system works, sometimes, duplidated data will be inserted (only id is different). The duplications' id (id only) are also inserted into another table (log_existing). We have another service which will delete the duplications in both tables. Here is what we are doing now.
SET #TotalRows = 0;
SET #Rows = 0;
WHILE 1=1
BEGIN
DECLARE #Ids TABLE (id BIGINT);
INSERT INTO #Ids
SELECT TOP (#BatchSize) Id
FROM Log
DELETE FROM Log WHERE Id IN (SELECT id FROM #Ids)
DELETE FROM Log_Existing WHERE Id IN (SELECT id FROM #Ids)
SET #Rows = ##ROWCOUNT
IF(#Rows < #BatchSize)
BEGIN
BREAK;
END
SET #TotalRows = #TotalRows + #Rows
IF(#TotalRows >= #DeleteSize)
BEGIN
BREAK;
END
SET #Rows = 0;
END
Basically, the service runs every 2 minutes (or 5 minutes, configurable) to run this batch delete. The #BatchSize = 2000 and #DeleteSize = 1000000, which usually runs more than 2/5 minutes.
It works ok for some time. But now we realize that there are too many dupliactions, this process can not delete duplications fast enough. So, the database size grows larger and larger, and process is slower and slower.
Is there a way to make it faster? or some kind of guideline?
Thanks
I would try to avoid inserting duplicates into the Log table. From your description this should be possible including some of the columns which make an entry unique (besides the Id).
One option is using the IGNORE_DUP_KEY option on an unique index. When such an index is existing and the INSERT statement tries to insert a row that violates the index's unique constraint the INSERT will be ignored. See Microsoft SQL Server Help.
CREATE TABLE #Test (C1 nvarchar(10), C2 nvarchar(50), C3 datetime);
GO
CREATE UNIQUE INDEX AK_Index ON #Test (C2)
WITH (IGNORE_DUP_KEY = ON);
GO
INSERT INTO #Test VALUES (N'OC', N'Ounces', GETDATE());
INSERT INTO #Test SELECT * FROM Production.UnitMeasure;
GO
SELECT COUNT(*)AS [Number of rows] FROM #Test;
GO
DROP TABLE #Test;
GO
I think if you use delete statement with the JOIN clause something like this. It should do better.
DELETE Log, Log_Existing FROM Log, Log_Existing
WHERE Log.LOGID=Log_Existing.LOGID

How to insert Huge dummy data to Sql server

Currently development team is done their application, and as a tester needs to insert 1000000 records into the 20 tables, for performance testing.
I gone through the tables and there is relationship between all the tables actually.
To insert that much dummy data into the tables, I need to understand the application completely in very short span so that I don't have the dummy data also by this time.
In SQL server is there any way to insert this much data insertion possibility.
please share the approaches.
Currently I am planning with the possibilities to create dummy data in excel, but here I am not sure the relationships between the tables.
Found in Google that SQL profiler will provide the order of execution, but waiting for the access to analyze this.
One more thing I found in Google is red-gate tool can be used.
Is there any script or any other solution to perform this tasks in simple way.
I am very sorry if this is a common question, I am working first time in SQL real time scenario. but I have the knowledge on SQL.
Why You don't generate those records in SQL Server. Here is a script to generate table with 1000000 rows:
DECLARE #values TABLE (DataValue int, RandValue INT)
;WITH mycte AS
(
SELECT 1 DataValue
UNION all
SELECT DataValue + 1
FROM mycte
WHERE DataValue + 1 <= 1000000
)
INSERT INTO #values(DataValue,RandValue)
SELECT
DataValue,
convert(int, convert (varbinary(4), NEWID(), 1)) AS RandValue
FROM mycte m
OPTION (MAXRECURSION 0)
SELECT
v.DataValue,
v.RandValue,
(SELECT TOP 1 [User_ID] FROM tblUsers ORDER BY NEWID())
FROM #values v
In table #values You will have some random int value(column RandValue) which can be used to generate values for other columns. Also You have example of getting random foreign key.
Below is a simple procedure I wrote to insert millions of dummy records into the table, I know its not the most efficient one but serves the purpose for a million records it takes around 5 minutes. You need to pass the no of records you need to generate while executing the procedure.
IF EXISTS (SELECT 1 FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[DUMMY_INSERT]') AND type in (N'P', N'PC'))
BEGIN
DROP PROCEDURE DUMMY_INSERT
END
GO
CREATE PROCEDURE DUMMY_INSERT (
#noOfRecords INT
)
AS
BEGIN
DECLARE #count int
SET #count = 1;
WHILE (#count < #noOfRecords)
BEGIN
INSERT INTO [dbo].[LogTable] ([UserId],[UserName],[Priority],[CmdName],[Message],[Success],[StartTime],[EndTime],[RemoteAddress],[TId])
VALUES(1,'user_'+CAST(#count AS VARCHAR(256)),1,'dummy command','dummy message.',0,convert(varchar(50),dateadd(D,Round(RAND() * 1000,1),getdate()),121),convert(varchar(50),dateadd(D,Round(RAND() * 1000,1),getdate()),121),'160.200.45.1',1);
SET #count = #count + 1;
END
END
you can use the cursor for repeat data:
for example this simple code:
Declare #SYMBOL nchar(255), --sample V
#SY_ID int --sample V
Declare R2 Cursor
For SELECT [ColumnsName]
FROM [TableName]
For Read Only;
Open R2
Fetch Next From R2 INTO #SYMBOL,#SY_ID
While (##FETCH_STATUS <>-1 )
Begin
Insert INTO [TableName] ([ColumnsName])
Values (#SYMBOL,#SY_ID)
Fetch Next From R2 INTO #SYMBOL,#SY_ID
End
Close R2
Deallocate R2
/*wait a ... moment*/
SELECT COUNT(*) --check result
FROM [TableName]

Find Unused ID in Database using C# and SQL

I am trying to code a simple database management tool in C#. I am in the process of coding a function to insert a new row into the database, but I have run into a problem. I need to be able to detect which ID numbers are not already taken. I have done some research but haven't found any clear answers.
Example table:
ID Name
---------------
1 John
2 Linda
4 Mark
5 Jessica
How would I add a function that automatically detects that ID 3 is empty, and places a new entry there?
Edit: My real question is; When I want to insert a new row via C#, how do I handle a column which is auto-increment? An example would be fantastic :)
I don't like giving answers like this...but I am going to anyway on this occasion.
Don't
What if you store more data in another table which has a foreign key to the ID in this table? If you reuse numbers you are asking for trouble with referential integrity down the line.
I assume your field is an int? If so, an auto increment should give more than enough for most purposes. It makes your insert simpler, and maintains integrity.
Edit: You might have a very good reason to do it, but I wanted to make the point in case somebody comes along and sees this later on who thinks it is a good idea.
SQL:
SELECT ID From TABLE
OR
SELECT t.ID
FROM ( SELECT number + 1 AS ID
FROM master.dbo.spt_values
WHERE Type = 'p'
AND number <= ( SELECT MAX(ID) - 1
FROM #Table
)
) t
LEFT JOIN #Table ON t.ID = [#Table].ID
WHERE [#Table].ID IS NULL
C#
DataTable dt = new DataTable();
//Populate Dt with SQL
var tableInts = dt.Rows.Cast<DataRow>().Select(row => row.Field<int>("ID")).ToList<int>();
var allInts = Enumerable.Range(1, tableInts.Max()).ToList();
var minInt = allInts.Except(tableInts).Min();
SELECT #temp.Id
FROM #temp
LEFT JOIN table1 ON #temp.Id = table1.Id
WHERE table1.Id IS NULL
Try this?
But my suggestion is, just autoincrement the field.
How you do that is, you set the IDENTITY property of the column to true, and set it as Primary key too(not null).
To handle inserts, you might need triggers, which are like stored procedures, but they can act in place of insert or update or delete, or before/after insert/update/delete
Google triggers.
from How do I find a "gap" in running counter with SQL?
select
MIN(ID)
from (
select
0 ID
union all
select
[YourIdColumn]+1
from
[YourTable]
where
--Filter the rest of your key--
) foo
left join
[YourTable]
on [YourIdColumn]=ID
and --Filter the rest of your key--
where
[YourIdColumn] is null

Procedure returning a list of identity [duplicate]

I am inserting records through a query similar to this one:
insert into tbl_xyz select field1 from tbl_abc
Now I would like to retreive the newly generated IDENTITY Values of the inserted records. How do I do this with minimum amount of locking and maximum reliability?
You can get this information using the OUTPUT clause.
You can output your information to a temp target table or view.
Here's an example:
DECLARE #InsertedIDs TABLE (ID bigint)
INSERT into DestTable (col1, col2, col3, col4)
OUTPUT INSERTED.ID INTO #InsertedIDs
SELECT col1, col2, col3, col4 FROM SourceTable
You can then query the table InsertedIDs for your inserted IDs.
##IDENTITY will return you the last inserted IDENTITY value, so you have two possible problems
Beware of triggers executed when inserting into table_xyz as this may change the value of ##IDENTITY.
Does tbl_abc have more than one row. If so then ##IDENTITY will only return the identity value of the last row
Issue 1 can be resolved by using SCOPE__IDENTITY() instead of ##IDENTITY
Issue 2 is harder to resolve. Does field1 in tbl_abc define a unique record within tbl_xyz, if so you could reselect the data from table_xyz with the identity column. There are other solutions using CURSORS but these will be slow.
SELECT ##IDENTITY
This is how I've done it before. Not sure if this will meet the latter half of your post though.
EDIT
Found this link too, but not sure if it is the same...
How to insert multiple records and get the identity value?
As far as I know, you can't really do this with straight SQL in the same script. But you could create an INSERT trigger. Now, I hate triggers, but it's one way of doing it.
Depending on what you are trying to do, you might want to insert the rows into a temp table or table variable first, and deal with the result set that way. Hopefully, there is a unique column that you can link to.
You could also lock the table, get the max key, insert your rows, and then get your max key again and do a range.
Trigger:
--Use the Inserted table. This conaints all of the inserted rows.
SELECT * FROM Inserted
Temp Table:
insert field1, unique_col into #temp from tbl_abc
insert into tbl_xyz (field1, unique_col) select field1, unique_col from tbl_abc
--This could be an update, or a cursor, or whatever you want to do
SELECT * FROM tbl_xyz WHERE EXISTS (SELECT top 1 unique_col FROM #temp WHERE unique_col = tbl_xyz.unique_col)
Key Range:
Declare #minkey as int, #maxkey as int
BEGIN TRANS --You have to lock the table for this to work
--key is the name of your identity column
SELECT #minkey = MAX(key) FROM tbl_xyz
insert into tbl_xyz select field1 from tbl_abc
SELECT #maxkey = MAX(key) FROM tbl_xyz
COMMIT Trans
SELECT * FROM tbl_xyz WHERE key BETWEEN #minkey and #maxkey

How to prevent duplicate records being inserted with SqlBulkCopy when there is no primary key

I receive a daily XML file that contains thousands of records, each being a business transaction that I need to store in an internal database for use in reporting and billing.
I was under the impression that each day's file contained only unique records, but have discovered that my definition of unique is not exactly the same as the provider's.
The current application that imports this data is a C#.Net 3.5 console application, it does so using SqlBulkCopy into a MS SQL Server 2008 database table where the columns exactly match the structure of the XML records. Each record has just over 100 fields, and there is no natural key in the data, or rather the fields I can come up with making sense as a composite key end up also having to allow nulls. Currently the table has several indexes, but no primary key.
Basically the entire row needs to be unique. If one field is different, it is valid enough to be inserted. I looked at creating an MD5 hash of the entire row, inserting that into the database and using a constraint to prevent SqlBulkCopy from inserting the row,but I don't see how to get the MD5 Hash into the BulkCopy operation and I'm not sure if the whole operation would fail and roll back if any one record failed, or if it would continue.
The file contains a very large number of records, going row by row in the XML, querying the database for a record that matches all fields, and then deciding to insert is really the only way I can see being able to do this. I was just hoping not to have to rewrite the application entirely, and the bulk copy operation is so much faster.
Does anyone know of a way to use SqlBulkCopy while preventing duplicate rows, without a primary key? Or any suggestion for a different way to do this?
I'd upload the data into a staging table then deal with duplicates afterwards on copy to the final table.
For example, you can create a (non-unique) index on the staging table to deal with the "key"
Given that you're using SQL 2008, you have two options to solve the problem easily without needing to change your application much (if at all).
The first possible solution is create a second table like the first one but with a surrogate identity key and a uniqueness constraint added using the ignore_dup_key option which will do all the heavy lifting of eliminating the duplicates for you.
Here's an example you can run in SSMS to see what's happening:
if object_id( 'tempdb..#test1' ) is not null drop table #test1;
if object_id( 'tempdb..#test2' ) is not null drop table #test2;
go
-- example heap table with duplicate record
create table #test1
(
col1 int
,col2 varchar(50)
,col3 char(3)
);
insert #test1( col1, col2, col3 )
values
( 250, 'Joe''s IT Consulting and Bait Shop', null )
,( 120, 'Mary''s Dry Cleaning and Taxidermy', 'ACK' )
,( 250, 'Joe''s IT Consulting and Bait Shop', null ) -- dup record
,( 666, 'The Honest Politician', 'LIE' )
,( 100, 'My Invisible Friend', 'WHO' )
;
go
-- secondary table for removing duplicates
create table #test2
(
sk int not null identity primary key
,col1 int
,col2 varchar(50)
,col3 char(3)
-- add a uniqueness constraint to filter dups
,constraint UQ_test2 unique ( col1, col2, col3 ) with ( ignore_dup_key = on )
);
go
-- insert all records from original table
-- this should generate a warning if duplicate records were ignored
insert #test2( col1, col2, col3 )
select col1, col2, col3
from #test1;
go
Alternatively, you can also remove the duplicates in-place without a second table, but the performance may be too slow for your needs. Here's the code for that example, also runnable in SSMS:
if object_id( 'tempdb..#test1' ) is not null drop table #test1;
go
-- example heap table with duplicate record
create table #test1
(
col1 int
,col2 varchar(50)
,col3 char(3)
);
insert #test1( col1, col2, col3 )
values
( 250, 'Joe''s IT Consulting and Bait Shop', null )
,( 120, 'Mary''s Dry Cleaning and Taxidermy', 'ACK' )
,( 250, 'Joe''s IT Consulting and Bait Shop', null ) -- dup record
,( 666, 'The Honest Politician', 'LIE' )
,( 100, 'My Invisible Friend', 'WHO' )
;
go
-- add temporary PK and index
alter table #test1 add sk int not null identity constraint PK_test1 primary key clustered;
create index IX_test1 on #test1( col1, col2, col3 );
go
-- note: rebuilding the indexes may or may not provide a performance benefit
alter index PK_test1 on #test1 rebuild;
alter index IX_test1 on #test1 rebuild;
go
-- remove duplicates
with ranks as
(
select
sk
,ordinal = row_number() over
(
-- put all the columns composing uniqueness into the partition
partition by col1, col2, col3
order by sk
)
from #test1
)
delete
from ranks
where ordinal > 1;
go
-- remove added columns
drop index IX_test1 on #test1;
alter table #test1 drop constraint PK_test1;
alter table #test1 drop column sk;
go
Why not simply use, instead of a Primary Key, create an Index and set
Ignore Duplicate Keys: YES
This will prevent any duplicate key to fire an error, and it will not be created (as it exists already).
I use this method to insert around 120.000 rows per day and works flawlessly.
I would bulk copy into a temporary table and then push the data from that into the actual destination table. In this way, you can use SQL to check for and handle duplicates.
What is the data volume? You have 2 options that I can see:
1: filter it at source, by implementing your own IDataReader and using some hash over the data, and simply skipping any duplicates so that they never get passed into the TDS.
2: filter it in the DB; at the simplest level, I guess you could have multiple stages of import - the raw, unsanitised data - and then copy the DISTINCT data into your actual tables, perhaps using an intermediate table if you want to. You might want to use CHECKSUM for some of this, but it depends.
And fix that table. No table ever should be without a unique index, preferably as a PK. Even if you add a surrogate key because there is no natural key, you need to be able to specifically identify a particular record. Otherwise how will you get rid of the duplicates you already have?
I think this is a lot cleaner.
var dtcolumns = new string[] { "Col1", "Col2", "Col3"};
var dtDistinct = dt.DefaultView.ToTable(true, dtcolumns);
using (SqlConnection cn = new SqlConnection(cn)
{
copy.ColumnMappings.Add(0, 0);
copy.ColumnMappings.Add(1, 1);
copy.ColumnMappings.Add(2, 2);
copy.DestinationTableName = "TableNameToMapTo";
copy.WriteToServer(dtDistinct );
}
This way only need one database table and can keep Bussiness Logic in code.

Categories