I'm trying to insert records using a high performance table parameter method ( http://www.altdevblogaday.com/2012/05/16/sql-server-high-performance-inserts/ ), and I'm curious if it's possible to retrieve back the identity values for each record I insert.
At the moment, the answer appears to be no - I insert the data, then retrieve back the identity values, and they don't match. Specifically, they don't match about 75% of the time, and they don't match in unpredictable ways. Here's some code that replicates this issue:
// Create a datatable with 100k rows
DataTable dt = new DataTable();
dt.Columns.Add(new DataColumn("item_id", typeof(int)));
dt.Columns.Add(new DataColumn("comment", typeof(string)));
for (int i = 0; i < 100000; i++) {
dt.Rows.Add(new object[] { 0, i.ToString() });
}
// Insert these records and retrieve back the identity
using (SqlConnection conn = new SqlConnection("Data Source=localhost;Initial Catalog=testdb;Integrated Security=True")) {
conn.Open();
using (SqlCommand cmd = new SqlCommand("proc_bulk_insert_test", conn)) {
cmd.CommandType = CommandType.StoredProcedure;
// Adding a "structured" parameter allows you to insert tons of data with low overhead
SqlParameter param = new SqlParameter("#mytable", SqlDbType.Structured);
param.Value = dt;
cmd.Parameters.Add(param);
SqlDataReader dr = cmd.ExecuteReader();
// Set all the records' identity values
int i = 0;
while (dr.Read()) {
dt.Rows[i].ItemArray = new object[] { dr.GetInt32(0), dt.Rows[i].ItemArray[1] };
i++;
}
dr.Close();
}
// Do all the records' ID numbers match what I received back from the database?
using (SqlCommand cmd = new SqlCommand("SELECT * FROM bulk_insert_test WHERE item_id >= #base_identity ORDER BY item_id ASC", conn)) {
cmd.Parameters.AddWithValue("#base_identity", (int)dt.Rows[0].ItemArray[0]);
SqlDataReader dr = cmd.ExecuteReader();
DataTable dtresult = new DataTable();
dtresult.Load(dr);
}
}
The database is defined using this SQL server script:
CREATE TABLE bulk_insert_test (
item_id int IDENTITY (1, 1) NOT NULL PRIMARY KEY,
comment varchar(20)
)
GO
CREATE TYPE bulk_insert_table_type AS TABLE ( item_id int, comment varchar(20) )
GO
CREATE PROCEDURE proc_bulk_insert_test
#mytable bulk_insert_table_type READONLY
AS
DECLARE #TableOfIdentities TABLE (IdentValue INT)
INSERT INTO bulk_insert_test (comment)
OUTPUT Inserted.item_id INTO #TableOfIdentities(IdentValue)
SELECT comment FROM #mytable
SELECT * FROM #TableOfIdentities
Here's the problem: the values returned from proc_bulk_insert_test are not in the same order as the original records were inserted. Therefore, I can't programmatically assign each record the item_id value I received back from the OUTPUT statement.
It seems like the only valid solution is to SELECT back the entire list of records I just inserted, but frankly I'd prefer any solution that would reduce the amount of data piped across my SQL Server's network card. Does anyone have better solutions for large inserts while still retrieving identity values?
EDIT: Let me try clarifying the question a bit more. The problem is that I would like my C# program to learn what identity values SQL Server assigned to the data that I just inserted. The order isn't essential; but I would like to be able to take an arbitrary set of records within C#, insert them using the fast table parameter method, and then assign their auto-generated ID numbers in C# without having to requery the entire table back into memory.
Given that this is an artificial test set, I attempted to condense it into as small of a readable bit of code as possible. Let me describe what methods I have used to resolve this issue:
In my original code, in the application this example came from, I would insert about 15 million rows using 15 million individual insert statements, retrieving back the identity value after each insert. This worked but was slow.
I revised the code using high performance table parameters for insertion. I would then dispose of all of the objects in C#, and read back from the database the entire objects. However, the original records had dozens of columns with lots of varchar and decimal values, so this method was very network traffic intensive, although it was fast and it worked.
I now began research to figure out whether it was possible to use the table parameter insert, while asking SQL Server to just report back the identity values. I tried scope_identity() and OUTPUT but haven't been successful so far on either.
Basically, this problem would be solved if SQL Server would always insert the records in exactly the order I provided them. Is it possible to make SQL server insert records in exactly the order they are provided in a table value parameter insert?
EDIT2: This approach seems very similar to what Cade Roux cites below:
http://www.sqlteam.com/article/using-the-output-clause-to-capture-identity-values-on-multi-row-inserts
However, in the article, the author uses a magic unique value, "ProductNumber", to connect the inserted information from the "output" value to the original table value parameter. I'm trying to figure out how to do this if my table doesn't have a magic unique value.
Your TVP is an unordered set, just like a regular table. It only has order when you specify as such. Not only do you not have any way to indicate actual order here, you're also just doing a SELECT * at the end with no ORDER BY. What order do you expect here? You've told SQL Server, effectively, that you don't care. That said, I implemented your code and had no problems getting the rows back in the right order. I modified the procedure slightly so that you can actually tell which identity value belongs to which comment:
DECLARE #TableOfIdentities TABLE (IdentValue INT, comment varchar(20))
INSERT INTO bulk_insert_test (comment)
OUTPUT Inserted.item_id, Inserted.comment
INTO #TableOfIdentities(IdentValue, comment)
SELECT comment FROM #mytable
SELECT * FROM #TableOfIdentities
Then I called it using this code (we don't need all the C# for this):
DECLARE #t bulk_insert_table_type;
INSERT #t VALUES(5,'foo'),(2,'bar'),(3,'zzz');
SELECT * FROM #t;
EXEC dbo.proc_bulk_insert_test #t;
Results:
1 foo
2 bar
3 zzz
If you want to make sure the output is in the order of identity assignment (which isn't necessarily the same "order" that your unordered TVP has), you can add ORDER BY item_id to the last select in your procedure.
If you want to insert into the destination table so that your identity values are in an order that is important to you, then you have a couple of options:
add a column to your TVP and insert the order into that column, then use a cursor to iterate over the rows in that order, and insert one at a time. Still more efficient than calling the entire procedure for each row, IMHO.
add a column to your TVP that indicates order, and use an ORDER BY on the insert. This isn't guaranteed, but is relatively reliable, particularly if you eliminate parallelism issues using MAXDOP 1.
In any case, you seem to be placing a lot of relevance on ORDER. What does your order actually mean? If you want to place some meaning on order, you shouldn't be doing so using an IDENTITY column.
You specify no ORDER BY on this: SELECT * FROM #TableOfIdentities so there's no guarantee of order. If you want them in the same order they were sent, do an INNER JOIN in that to the data that was inserted with an ORDER BY which matches the order the rows were sent in.
Related
I have the following SQL query which I am sending from a C# program:
DECLARE #id as int
SELECT #id = max(fault_catagory_ident) FROM fault_catagory_list
INSERT INTO fault_catagory_list (fault_catagory_ident, fault_catagory)
VALUES (#id + 1, 'TEST')
SELECT #id + 1
The 'fault_catagory' value is coming from my program, but the ident value needs to be the next number in line (primary key) from the existing table in the database. My C# code is parameterising values for security.
I have two problems:
How can I get the #id + 1 value returned to my program (executeNonQuery doesn't return anything)?
How can I get #id as a parametarised value for the insert command?
I am wondering if my primary key could be automated in some way?
I want to carry all this out in one single query, as there will be a risk of multiple logins running this same query. If any happened to run simultainiously, the #id value may get duplicated and one would fail.
Apologies if there isn't enough info here, I'm on a learning curve!
Any advice would be greatly appreciated. Thanks in advance.
I think that you will find everything you need in this example provided in MSDN:
https://learn.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlcommand.executescalar?view=dotnet-plat-ext-5.0
in short:
to return a single parameter from a query ull use ExecuteScalar()
parameters are added to a query through the SqlCommand class provided in System.Data.SqlClient.
cheers!
Thanks for the advice, think I have found the solution...
Firstly I need to recreate the table with the primary key column using the IDENTITY constraint (makes sense to use this now I know it exists!). Found a guide here (though it will mean some rebuilding of primary/foreign key links) https://www.datameer.com/blog/how-to-add-an-identity-to-an-existing-column-in-sql/
Then in C# program, use SqlCommand.executeScalar to return the identity value
Int identReturn = 0;
identReturn = (Int32)cmd.ExecuteScalar();
Thanks again for the responses.
While I don't agree with the logic in which you are choosing identity row values for the inserted rows, you could certainly acheive that using IDENTITY() column attribute in the SQL table definition.
In case you have multiple SELECTs in your SQL command,
everytime you have a SELECT in your sql command, a new table is added to the passed in dataset to the data adapter.
string sql = "
DECLARE #id as int
SELECT #id = max(fault_catagory_ident) FROM fault_catagory_list
INSERT INTO fault_catagory_list (fault_catagory_ident, fault_catagory)
VALUES (#id + 1, 'TEST')
SELECT #id + 1";
SqlCommand cmd = new SqlCommand(sql, conn); // assuming you already set connection
SqlDataAdapter da = new SqlDataAdapter(cmd);
DataSet ds = new DataSet();
da.Fill(ds);
Console.WriteLine( ds.Tables[0].Rows[0][0]); // this will print #id
Console.WriteLine( ds.Tables[1].Rows[0][0]); // this will print #id + 1
Language: C# DB: Access (So, OleDbDataAdapter)
Context:
My current project, I'm building a sort of sql query builder. Depending on what the user selects from a drop down list, an sql command can look like this, not sure if it's relevant to the question though [] are columns/table () are drop down optionals, the parenthesis aren't actually there:
Select * From [db] where [Date] (between) #value1 and #value2 AND [ID] (=) #ID AND [Usd] (=) #Usd
Well, you get the idea.
I run this through the following code:
sbuilder = new StringBuilder();
sbuilder.Append("Select * FROM ").Append(Current_Table).Append(" WHERE ");
using (OleDbConnection connection = new OleDbConnection(Con))
{
connection.Open();
string query = sbuilder.ToString();
OleDbDataAdapter da = new OleDbDataAdapter(query, connection);
da.SelectCommand.Parameters.Clear();
//some code to build the string, AND build the select parameters, here's the important one though
da.SelectCommand.Parameters.AddWithValue("#USD", Convert.ToDouble(FilterUSD.Text));
DataTable dt = new DataTable();
da.Fill(dt);
DGVMain.DataSource = dt;
connection.Close();
}
My problem:
When I retrieve the values, as you expect, they must match the value of FilterUSD.Text, so when a user searches for 12, and the DB contains 12.31251, he will return 0 rows. How do I make it so that when the parameter is 12, it would return all values that have a base value 12 and any following decimal values. The examples i've looked at online seem to suggeset using an sql data reader, and retrieving the values into a variable Double.
How do I proceed in the context of using a data adapter, as I am currently using, to fill the datatable? (and later, a datagridview)
My hunch is I will have to make use of the parameters of my select command.
Found this link, I dont know how to convert it for my use though as a datatable: Read decimal from SQL Server database
I'm assuming that your only ever searching for positive whole numbers, so here's two ways which I can think of. I'd probably go with the first one, incase anything unexpected happens with the CAST function.
Add two comparisons in the WHERE clause to pickups records greater than or equal to your search parameter and records less than your search parameter + 1.
[Usd] >= 12 AND [Usd] < 13
Cast the db field to an int in the WHERE clause, so that the decimal places are removed.
cast([Usd] as int) = 12
EDIT: Didn't realise you were using Access (above is for SQL Server). This should be used instead: Int([Usd]) = 12.
If you want to work with negative numbers as well, then you'll get different results from these 2 options.
Select * From [db] where [Date] (between) #value1 and #value2 AND [ID] (=) #ID AND left([Usd],len(Usd)) (=) #Usd
the code I added is the left function I'm not aware if this is working with access but I've used this in sql try to give it a shot. if the DB contains 12.33321 and your parameter name "#Usd" is 12 the db will only give you the whole number you wanted and if -12 but the DB contains -12.3123 it will still give you -12.
We are building an MVC project that needs to make use of of the MVC DataGrid. As part of that, we are wanting to allow for filtering and ordering of the DataGrid columns. We want this to be handled on the Sql side, with paging. Handling the paging is really straightforward and we've already got that working with our Stored Procedures.
The challenge we are facing now is how to get what columns the user has sorted by, into the stored procedure so we can sort the records during paging. I played with using a Table Type to send in a 'collection' of columns using something like this:
CREATE TYPE [dbo].[SortableEntity] AS TABLE(
[TableName] [varchar](50) NULL,
[ColumnName] [varchar](50) NULL,
[Descending] [bit] NULL
)
CREATE PROCEDURE [dbo].[DoSomethingWithEmployees]
#SortOrder AS dbo.SortableEntity READONLY
AS
BEGIN
SELECT [ColumnName] FROM #SortOrder
END
We're using Dapper as our ORM, and we're constrained to using only Stored Procedures by policy. In my Repository, I use the following DataTable to try and insert the records into the SortableEntity which works fine.
var parameters = new DynamicParameters();
// Check if we have anything to sort by
IEnumerable<SortDefinition> sortingDefinitions = builder.GetSortDefinitions();
if (sortingDefinitions.Count() > 0)
{
var dt = new DataTable();
dt.Columns.Add(nameof(SortableEntity.TableName));
dt.Columns.Add(nameof(SortableEntity.ColumnName));
dt.Columns.Add(nameof(SortableEntity.IsDescending));
Type tableType = typeof(SortableEntity);
foreach(SortDefinition sortDefinition in sortingDefinitions)
{
var dataRow = dt.NewRow();
dataRow.SetField(0, sortDefinition.TableName);
dataRow.SetField(0, sortDefinition.Column);
dataRow.SetField(2, sortDefinition.IsDescending);
dt.Rows.Add(dataRow);
}
parameters.Add("SortOrder", dt.AsTableValuedParameter(tableType.Name));
}
With this I'm able to get my sorted values into the stored procedure, but I'm concerned with Sql Injection. One way I can see getting around it is to lookup in the sys-columns table to see if the columns given are valid columns before using them. I'm not sure how to go about doing that, and taking the valid columns and applying them to an order by statement in my Stored Procedure. Since we're not using Sql parameter objects for the values being inserted into the DataTable, how do we protect against Sql injection? I know using DynamicParameters will protect us for the values going into the Stored Procedure parameters, but how does that work when the value is a table containing values?
The biggest challenge though is the WHERE clause. We want to pass in a filter from the data grid into the stored procedure, so users can filter out results sets. The idea being that the stored procedure would filter, order and page for us. I know I can handle this easily in Dapper using embedded or dynamic Sql; attempting to handle this via a Stored Procedure has proven to be over-my-head. What would I need to do to have my Stored Procedure receive a predicate from the app, applicable to a series of columns, that it applies as a WHERE clause in a safe manor, that won't open us up to Sql Injection?
I guess the only way to make your parameter inputs 'safe' is to check the values before assigning to your stored proc parameters. You'd have to look for 'SELECT', 'DELETE', and 'UPDATE'. But, I think since you are working with column names instead of entire dynamic SQL commands, you should be ok. Read the following: tsql - how to prevent SQL injection
But, I'm no expert on this. You should do your own research.
To give you an idea on how to handle dynamic filtering in a stored procedure, I just use a SQL function that splits up a string with comma separated values and turns it into a table. I JOIN this function with the table that contains the column that needs to be filtered. For example, I need to filter my dataset with multiple values using the DIVISION column from some table. My stored procedure will take in a optional VARCHAR parameter of length 3000:
#strDIVISION VARCHAR(3000) = NULL
Next, when receiving a NULL value for this parameter, give it an empty string value:
SELECT #strDIVISION = ISNULL(#strDIVISION,'')
Instead of filtering in the WHERE clause, you can JOIN the string split function as such:
...
FROM tblTransDTL td
INNER JOIN tblTransHDR th ON th.JOB_ID = td.JOB_ID
INNER JOIN dbo.udf_STRSPLIT(#strDIVISION) d1 ON
(d1.Value = th.DIVISION OR 1=CASE #DIVISION WHEN '' THEN 1 ELSE 0 END)
The CASE statement helps to determine when all values should be allowed or use only the values from the parameter input.
Lastly, this is the SQL function that splits the string values into a table:
CREATE FUNCTION udf_STRSPLIT
(
#Delim_Values VARCHAR(8000)
)
RETURNS #Result TABLE(Value VARCHAR(2000))
AS
begin
WITH StrCTE(start, stop) AS
(
SELECT 1, CHARINDEX(',' , #Delim_Values )
UNION ALL
SELECT stop + 1, CHARINDEX(',' ,#Delim_Values , stop + 1)
FROM StrCTE
WHERE stop > 0
)
insert into #Result
SELECT SUBSTRING(#Delim_Values , start, CASE WHEN stop > 0 THEN stop-start ELSE 4000 END) AS stringValue
FROM StrCTE
return
end
GO
I am inserting data through a user-defined table type, and I am getting back the inserted identity values in incorrect order.
Here is the stored procedure:
CREATE PROCEDURE [dbo].[InsertBulkAgency](
#AgencyHeaders As AgencyType READONLY)
AS
BEGIN
insert into [Agency]
output Inserted.Id
select *,'cm',GETDATE(),'comp',GETDATE() from #AgencyHeaders
END
And here is the C# code to save the identity values:
using (SqlCommand cmd = new SqlCommand("InsertBulkAgency", myConnection))
{
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add("#AgencyHeaders", SqlDbType.Structured).Value = dt;
myConnection.Open();
rdr = cmd.ExecuteReader();
while (rdr.Read())
{
sample.Add(rdr["Id"].ToString());
}
myConnection.Close();
}
The returned values in the list should be sequential but they are completely random. How can I retrieve back the correct order of inserted values?
Did you try adding an ORDER BY?
insert into dbo.[Agency]
output inserted.Id
select *,'cm',GETDATE(),'comp',GETDATE() from #AgencyHeaders
ORDER BY inserted.Id;
Or using .sort() once you have the data back in your application.
If you don't have an ORDER BY, you shouldn't expect any specific order from SQL Server. Why should the values in the list be in any sequential order, if you have just said "give me this set"? Why should SQL Server predict that you want them sorted in any specific way? And if it did assume you wanted the data sorted, why wouldn't it pick name or any other column to order by? Truth is, SQL Server will pick whatever sort order it deems most efficient, if you've effectively told it you don't care, by not bothering to specify.
Also, why are you converting the Id to a string (which will also cause problems with sorting, since '55' < '9')? I suggest you make sure your list uses a numeric type rather than a string, otherwise it will not always sort the way you expect.
I have found a strange phenomena on MSSQL server.
Let say we have a table:
CREATE TABLE [testTable]
(
[ID] [numeric](11, 0) NOT NULL,
[Updated] [datetime] NULL,
PRIMARY KEY (ID)
);
I do a simple select based on Updated field:
SELECT TOP 10000 ID, Updated
FROM testTable
WHERE Updated>='2013-05-22 08:55:12.152'
ORDER BY Updated
And now comes the fun part: how can I have in result set double records - I mean same ID in 2 records with different Updated value.
For me it seems to be, that the Updated datetime value was changed and it was included one more time in result set. But is it possible?
UPDATE:
Source code I using for downloading data from SQL server:
using (SqlCommand cmd = new SqlCommand(sql, Connection) { CommandTimeout = commandTimeout })
{
using (System.Data.SqlClient.SqlDataAdapter adapter = new System.Data.SqlClient.SqlDataAdapter(cmd))
{
DataTable retVal = new DataTable();
adapter.Fill(retVal);
return retVal;
}
}
Connection = SqlConnection
sql = "SELECT TOP 10000 ...."
Your question seems to lack some details but here's my ideas.
The first case I'd think of would be that you are somehow selecting those IDs twice (could be a join, group by, ...). Please manually check inside your table (in MSSQL Server rather than inside a function or method) to see if there is dupplicated IDs. If there is, the issue is that your Primary Key hasn't been set correctly. Otherwise, you will need to provide all the relevant code that is used to select the data in order to get more help.
Another case might be that someone or something altered the primary key so it is on both ID and Updated, allowing the same ID to be inserted twice as long as the Updated field doesn't match too.
You may also try this query to see if it gets dupplicated IDs inside your context:
SELECT ID
from testTable
ORDER BY ID
I hope this helps.