How to insert Huge dummy data to Sql server - c#

Currently development team is done their application, and as a tester needs to insert 1000000 records into the 20 tables, for performance testing.
I gone through the tables and there is relationship between all the tables actually.
To insert that much dummy data into the tables, I need to understand the application completely in very short span so that I don't have the dummy data also by this time.
In SQL server is there any way to insert this much data insertion possibility.
please share the approaches.
Currently I am planning with the possibilities to create dummy data in excel, but here I am not sure the relationships between the tables.
Found in Google that SQL profiler will provide the order of execution, but waiting for the access to analyze this.
One more thing I found in Google is red-gate tool can be used.
Is there any script or any other solution to perform this tasks in simple way.
I am very sorry if this is a common question, I am working first time in SQL real time scenario. but I have the knowledge on SQL.

Why You don't generate those records in SQL Server. Here is a script to generate table with 1000000 rows:
DECLARE #values TABLE (DataValue int, RandValue INT)
;WITH mycte AS
(
SELECT 1 DataValue
UNION all
SELECT DataValue + 1
FROM mycte
WHERE DataValue + 1 <= 1000000
)
INSERT INTO #values(DataValue,RandValue)
SELECT
DataValue,
convert(int, convert (varbinary(4), NEWID(), 1)) AS RandValue
FROM mycte m
OPTION (MAXRECURSION 0)
SELECT
v.DataValue,
v.RandValue,
(SELECT TOP 1 [User_ID] FROM tblUsers ORDER BY NEWID())
FROM #values v
In table #values You will have some random int value(column RandValue) which can be used to generate values for other columns. Also You have example of getting random foreign key.

Below is a simple procedure I wrote to insert millions of dummy records into the table, I know its not the most efficient one but serves the purpose for a million records it takes around 5 minutes. You need to pass the no of records you need to generate while executing the procedure.
IF EXISTS (SELECT 1 FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[DUMMY_INSERT]') AND type in (N'P', N'PC'))
BEGIN
DROP PROCEDURE DUMMY_INSERT
END
GO
CREATE PROCEDURE DUMMY_INSERT (
#noOfRecords INT
)
AS
BEGIN
DECLARE #count int
SET #count = 1;
WHILE (#count < #noOfRecords)
BEGIN
INSERT INTO [dbo].[LogTable] ([UserId],[UserName],[Priority],[CmdName],[Message],[Success],[StartTime],[EndTime],[RemoteAddress],[TId])
VALUES(1,'user_'+CAST(#count AS VARCHAR(256)),1,'dummy command','dummy message.',0,convert(varchar(50),dateadd(D,Round(RAND() * 1000,1),getdate()),121),convert(varchar(50),dateadd(D,Round(RAND() * 1000,1),getdate()),121),'160.200.45.1',1);
SET #count = #count + 1;
END
END

you can use the cursor for repeat data:
for example this simple code:
Declare #SYMBOL nchar(255), --sample V
#SY_ID int --sample V
Declare R2 Cursor
For SELECT [ColumnsName]
FROM [TableName]
For Read Only;
Open R2
Fetch Next From R2 INTO #SYMBOL,#SY_ID
While (##FETCH_STATUS <>-1 )
Begin
Insert INTO [TableName] ([ColumnsName])
Values (#SYMBOL,#SY_ID)
Fetch Next From R2 INTO #SYMBOL,#SY_ID
End
Close R2
Deallocate R2
/*wait a ... moment*/
SELECT COUNT(*) --check result
FROM [TableName]

Related

How to delete data faster from large table in SQL Server?

I have a huge table (log) which keeps some history data. It has more than 10 columns:
Id, Year, Month, Day, data1, data2, data3, ......
Because the table is huge, it has lots of index on it.
The system keeps inserting lots of new data into this table. However, because of the way the system works, sometimes, duplidated data will be inserted (only id is different). The duplications' id (id only) are also inserted into another table (log_existing). We have another service which will delete the duplications in both tables. Here is what we are doing now.
SET #TotalRows = 0;
SET #Rows = 0;
WHILE 1=1
BEGIN
DECLARE #Ids TABLE (id BIGINT);
INSERT INTO #Ids
SELECT TOP (#BatchSize) Id
FROM Log
DELETE FROM Log WHERE Id IN (SELECT id FROM #Ids)
DELETE FROM Log_Existing WHERE Id IN (SELECT id FROM #Ids)
SET #Rows = ##ROWCOUNT
IF(#Rows < #BatchSize)
BEGIN
BREAK;
END
SET #TotalRows = #TotalRows + #Rows
IF(#TotalRows >= #DeleteSize)
BEGIN
BREAK;
END
SET #Rows = 0;
END
Basically, the service runs every 2 minutes (or 5 minutes, configurable) to run this batch delete. The #BatchSize = 2000 and #DeleteSize = 1000000, which usually runs more than 2/5 minutes.
It works ok for some time. But now we realize that there are too many dupliactions, this process can not delete duplications fast enough. So, the database size grows larger and larger, and process is slower and slower.
Is there a way to make it faster? or some kind of guideline?
Thanks
I would try to avoid inserting duplicates into the Log table. From your description this should be possible including some of the columns which make an entry unique (besides the Id).
One option is using the IGNORE_DUP_KEY option on an unique index. When such an index is existing and the INSERT statement tries to insert a row that violates the index's unique constraint the INSERT will be ignored. See Microsoft SQL Server Help.
CREATE TABLE #Test (C1 nvarchar(10), C2 nvarchar(50), C3 datetime);
GO
CREATE UNIQUE INDEX AK_Index ON #Test (C2)
WITH (IGNORE_DUP_KEY = ON);
GO
INSERT INTO #Test VALUES (N'OC', N'Ounces', GETDATE());
INSERT INTO #Test SELECT * FROM Production.UnitMeasure;
GO
SELECT COUNT(*)AS [Number of rows] FROM #Test;
GO
DROP TABLE #Test;
GO
I think if you use delete statement with the JOIN clause something like this. It should do better.
DELETE Log, Log_Existing FROM Log, Log_Existing
WHERE Log.LOGID=Log_Existing.LOGID

How can I elegantly modify the number of affected rows returned by a Stored Procedure to ExecuteNonQuery() via the DONE_IN_PROC token?

I use a custom ORM generator that calls stored procedures, and validates the number of rows affected by UPDATE and DELETE statements using ExecuteNonQuery(), along these lines:
// Execute the stored procedure and get the number of rows affected.
int result = command.ExecuteNonQuery();
// Exactly one row was deleted, as expected.
if (result == 1)
return true;
// No rows were deleted. Maybe another user beat us to it. Fine.
if (result == 0)
return false;
// We don't know how many rows were deleted. Oh well.
if (result == -1)
return false;
// Something awful has happened; it's probably a bug in the stored procedure.
throw new Exception("Too many rows were deleted!");
When my stored procedures are mundane T-SQL updates and deletes against local tables, this system works fine.
CREATE PROCEDURE [widgets].[Update]
#WidgetID int,
#NewName varchar(10)
AS
BEGIN
UPDATE Widgets SET Name = #NewName WHERE WidgetID = #WidgetID
END
However, sometimes I need to EXEC against a Linked Server:
CREATE PROCEDURE [widgets].[Update]
#WidgetID int,
#NewName varchar(10)
AS
BEGIN
DECLARE #OpenQuery varchar(max)
SET #OpenQuery = 'execute function mydata:widgets_Update(' + CAST(#WidgetID as varchar())+ ',''' + #NewName + ''')'
DECLARE #Query varchar(max)
SET #Query = 'SELECT * FROM OPENQUERY(INFORMIX, ''' + #OpenQuery +''')'
EXEC (#Query)
END
If I'm not directly issuing INSERT, UPDATE or DELETE statements in T-SQL, SQL Server (by design) returns the value -1 to ExecuteNonQuery() via the DONE_IN_PROC token. My ORM code can't do anything useful with this, so I'm willing to cheat a little.
First, I modify the remote query on the linked server to return the number of affected rows as an integer. For the stored procedure widgets_Update() on my remote Informix server, for example, I'll add this to the end:
-- Return the number of rows affected.
return DBINFO('sqlca.sqlerrd2');
Then I consume that number in order fake out the ##ROWCOUNT/DONE_IN_PROC mechanism:
-- Turn off row counts for the moment.
SET NOCOUNT ON
-- Create a dummy table to get the result from EXEC into a local variable
DECLARE #Rowcount Table(n int)
INSERT #Rowcount EXEC (#Query)
DECLARE #N int = (SELECT n FROM #Rowcount)
-- Create a dummy table to receive the inserted rows.
DECLARE #Table table (n int)
-- Modify the number of Affected Rows returned in the DONE_IN_PROC token by inserting exactly the right number of dummy rows into a dummy table.
SET NOCOUNT OFF
INSERT #Table SELECT * FROM ModifyROWCOUNT(#N)
The inline table-valued function ModifyROWCOUNT() simply generates empty rows (in the spirit of a numbers table) on the fly, using code I cribbed from another answer:
CREATE FUNCTION [dbo].[ModifyROWCOUNT]
(
#Rowcount int
)
RETURNS TABLE
AS
RETURN
(
WITH
L0 AS (SELECT c FROM (SELECT 1 UNION ALL SELECT 1) AS D(c)), -- 2^1
L1 AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), -- 2^2
L2 AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), -- 2^4
L3 AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), -- 2^8
L4 AS (SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), -- 2^16
L5 AS (SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), -- 2^32
Nums AS (SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS k FROM L5)
SELECT null as n
FROM nums
WHERE k <= #Rowcount
)
So far, this approach seems to work, and performs just fine; however, I've only tested it when updating or deleting one record at a time. Fortunately, this is the primary use case. My concern is that the code-to-effect ratio is so high that this solution smells bad to me, and I have a significant number of remote queries to build out this way.
My question is this: If my stored procedure must use EXEC and OPENQUERY to insert, update or delete records on a remote server, is there a superior way to return the actual affected row count from T-SQL so I can consume it with ExecuteNonQuery()?
Please assume that I can't add OUTPUT parameters to these stored procedures, which would require the use of ExecuteScalar() instead. And note that there are mechanisms (namely SELECT statements) that will modify ##ROWCOUNT, but still return -1 in the DONE_IN_PROC token.

SQL: Pull List Of Tables From Specified Database While Attached To Another

I am facing a peculiar issue with loading a list of tables from a specific database (well rather a group of databases) while attached to the master database. Currently my query loads all of the databases on the server, then loops through those databases sending information back to the client via RAISERROR. As this loop is executing I need a nested loop to load all of the tables for the current database for later transmission as a SELECT once the query has completed. The issue I'm running into is that this will be executed as a single query inside of C# code. Ideally I would like to load everything in SQL and return it to the client for processing. For example:
WHILE (#dbLoop < #dbCount) BEGIN
-- Do cool things and send details back to client.
SET #dbName = (SELECT _name FROM dbTemp WHERE _id = #dbLoop);
-- USE [#dbName]
-- Get a count of the tables from info schema on the newly specified database.
WHILE (#tableLoop < #tableCount) BEGIN
-- USE [#dbName]
-- Do super cool things and load tables from info schema.
SET #tableLoop += 1;
END
SET #dbLoop += 1;
END
-- Return the list of tables from all databases to the client for use with SQLDataAdapter.
SELECT * FROM tableTemp;
This topic is pretty straight forward; I just need a way to access tables in a specified database (preferably by name) without having to change the connection on the SqlConnection object, and without having to have a loop inside of my C# code to process the same query on each database on the C# side. It would be more efficient to load everything in SQL and send it back to the application. Any help that can be provided on this would be great!
Thanks,
Jamie
All the tables are in the meta data you can just do a query against that and join to your list of schemas you want to look at.
SELECT tab.name
FROM sys.tables AS tab
JOIN sys.schemas AS sch on tab.schema_id = sch.schema_id
JOIN dbTemp temp on sch.name = temp.[_name]
This returns a list of the table to return back as a result set.
The statement USE [#dbName] takes effect AFTER it is run (usually via the GO statement.
USE [#dbName]
GO
The above 2 lines would make you start using the new Database. You cannot use this in the middle of your SQL or SP.
One other option which you can use is to use the dot notation, i.e., dbname..tablename syntax to query your tables.
double dot notation post
Okay, after spending all day working on this, I have finally come up with a solution. I load all the databases into a table variable, then I begin looping through those databases and send back their details to the client. After the database details themselves have been sent to the client via RAISERROR I then utilize sp_executesql to execute a new sub-query with the current database specified to get the list of tables for processing at the end of the primary. The example below demonstrates the basic structure of this process for others experiencing this issue in the future.
Thank you all once again for your help!
-Jamie
DECLARE #LoopCounter INT = 1, #DatabaseCount INT = 0;
DECLARE #SQL NVARCHAR(MAX), #dbName NVARCHAR(MAX);
DECLARE #Databases TABLE ( _id INT, _name NVARCHAR(MAX) );
DECLARE #Tables TABLE ( _name NVARCHAR(MAX), _type NVARCHAR(15) );
INSERT INTO #Databases
SELECT ROW_NUMBER() OVER(ORDER BY name) AS id, name
FROM sys.databases
WHERE name NOT IN ( 'master', 'tempdb', 'msdb', 'model' );
SET #DatabaseCount = (SELECT COUNT(*) FROM #Databases);
WHILE (#LoopCounter <= #DatabaseCount) BEGIN
SET #dbName = (SELECT _name FROM #Databases WHERE _id = #LoopCounter);
SET #SQL NVARCHAR(MAX) = 'SELECT TABLE_NAME, TABLE_TYPE
FROM [' + #dbName + '].INFORMATION_SCHEMA.TABLES';
INSERT INTO #Tables EXEC sp_executesql #SQL;
SET #LoopCounter += 1;
END

SQL Server 2008: re increment table after deletion

using SQL Server 2008, using MS Visual Studio 2012 C# .NET4.5
I asked a similar question last week that was solved with the following query:
DECLARE #from int = 9, #to int = 3
UPDATE MainPareto
SET pareto = m.new_pareto
FROM (
SELECT pKey, -- this is your primary key for the table
new_pareto = row_number()
over(ORDER BY CASE WHEN pareto = #from THEN #to ELSE pareto END,
CASE WHEN pareto = #from THEN 0 ELSE 1 END)
FROM MainPareto
-- put in any conditions that you want to restrict the scores by.
WHERE PG = #pg AND pareto IS NOT NULL
-- end condtions
) as m
INNER JOIN MainPareto ON MainPareto.pKey = m.pKey
WHERE MainPareto.pareto <> m.new_pareto
As you can see this works great, incriments the "league" when changes are made.
Now after some functionality user has requested a deletion and recovery.
On my winform, the user can right click the grid and delete the "part" number.
The user can also recover if needed.
However, I need a Stored procedure that will resort the grid and update like this method does after a deletion from another stored procedure has been made, my Winform will sort that part out, but i do need a procedure that can do what my current one does for a deletion.
Hope you guys understand, if not ask me and ill try and clarify best I can.
I am not totally sure if this is what you are looking for, but this is how you can reseed your Primary Key column (if your primary key is also an identity). Notice how my insert after the truncate does not include Column 1 (the primary key column).
select *
into #temp
from MainPareto
truncate table MainPareto
insert into MainPareto (col2, col3, col4) --...
select col2, col3, col4 --...
from #temp

Creating a SQL table from a comma concatenated list

I am running SQL Server and I have a stored procedure. I want do a select statement with a WHERE IN clause. I don't know how long the list will be so right now I have tried something as follows
SELECT * FROM table1 WHERE id IN (#idList)
in this solution #idList is a varChar(max). but this doesn't work. I heard about passing in table values, but I am confused about how to do that. Any help would be great
I would suggest using a function to split the incoming list (use the link that Martin put in his comment).
Store the results of the split function in a temporary table or table variable and join it in your query instead of the WHERE clause
select * into #ids from dbo.Split(',', #idList)
select t.*
from table1 t
join #ids i
on t.id = i.s
The most efficient way would be to pass in a table valued parameter (if you're on SQL Server 2008), or an XML parameter (if you're on SQL Server 2005/2000). If your list is small (and you're on SQL Server 2005/2000), passing in your list as a comma (or otherwise) delimited list and using a split function to divide the values out into rows in a temporary table is also an option.
Whichever option you use, you would then join this table (either the table parameter, the table resulting from the XML select, or the temporary table created by the values from the split) to your main query.
Here is a table valued function that takes a nvarchar and returns a table to join on:
Create function [ReturnValues]
(
#Values nvarchar(4000)
)
Returns #ValueTable table(Value nvarchar(2000))
As
Begin
Declare #Start int
Declare #End int
Set #Start = 1
Set #End = 1
While #Start <= len(#Values)
Begin
Set #End = charindex(',', #Values, #Start)
If #End = 0
Set #End = len(#Values) + 1
Insert into #ValueTable
Select rtrim(ltrim(substring(#Values, #Start, #End - #Start)))
Set #Start = #End + 1
End
Return
End
GO
Binding an #idList parameter as you suggested is not possible with SQL.
The best would be bulk inserting the ids into a separated table and than query that table by using an subselect, or joining the IDs.
e.g.
INSERT INTO idTable (id, context) values (#idValue, 1);
INSERT INTO idTable (id, context) values (#idValue, 1);
INSERT INTO idTable (id, context) values (#idValue, 1); // as often as you like
SELECT * FROM table1, idTable WHERE table1.id == idTable.id and idTable.context = 1
The context must be a unique value that identifies the Id Range. That is important for running the stored proc parallel. Without the context information, running the stored procecure in parallel would mix the values from different selections.
If the number of parameters are reasonably small (< 100) you can use several parameters
SELECT * FROM table1 WHERE IN id IN (#id1, #id2, #id3)
If it is longer, look for a split function.

Categories