I know I can do a bulk insert into my table with an identity column by not specifying the SqlBulkCopyOptions.KeepIdentity as mentioned here.
What I would like to be able to do is get the identity values that the server generates and put them in my datatable, or even a list. I saw this post, but I want my code to be general, and I can't have a version column in all my tables. Any suggestions are much appreciated. Here is my code:
public void BulkInsert(DataTable dataTable, string DestinationTbl, int batchSize)
{
// Get the DataTable
DataTable dtInsertRows = dataTable;
using (SqlBulkCopy sbc = new SqlBulkCopy(sConnectStr))
{
sbc.DestinationTableName = DestinationTbl;
// Number of records to be processed in one go
sbc.BatchSize = batchSize;
// Add your column mappings here
foreach (DataColumn dCol in dtInsertRows.Columns)
{
sbc.ColumnMappings.Add(dCol.ColumnName, dCol.ColumnName);
}
// Finally write to server
sbc.WriteToServer(dtInsertRows);
}
}
AFAIK, you can't.
The only way (that I know of) to get the values(s) of the identity field is by using either SCOPE_IDENTITY() when you insert row-by-row; or by using the OUTPUT approach when inserting an entire set.
The 'simplest' approach probably would be that you would SqlBulkCopy the records in the table and then fetch them back again later on. The problem might be that it could be hard to properly (and quickly) fetch those rows from the server again. (e.g. it would be rather ugly (and slow) to have a WHERE clause with IN (guid1, guid2, .., guid999998, guid999999) =)
I'm assuming performance is an issue here as you're already using SqlBulkCopy so I'd suggest to go for the OUTPUT approach in which case you'll firstly need a staging table to SqlBulkCopy your records in. Said table should then be including some kind of batch-identifier (GUID?) as to allow multiple treads to run side by side. You'll need a stored procedure to INSERT <table> OUTPUT inserted.* SELECT the data from the staging-table into the actual destination table and also clean-up the staging table again. The returend recordset from said procedure would then match 1:1 to the origanal dataset responsible for filling the staging table, but off course you should NOT rely on it's order. In other words : your next challenge than will be matching the returned Identity-fields back to the original records in your application.
Thinking things over, I'd say that in all cases -- except the row-by-row & SCOPY_IDENTITY() approach, which is going to be dog-slow -- you'll need to have (or add) a 'key' to your data to link the generated id's back to the original data =/
You can do a similar approach described above by deroby but instead of retrieving them back via a WHERE IN (guid1, etc... You match them back up to the rows inserted in memory based on their order.
So I would suggest to add a column onto the table to match the row to a SqlBulkCopy transaction and then do the following to match the generated Ids back to the in memory collection of rows you just inserted.
Create a new Guid and set this value on all the rows in the bulk copy mapping to the new column
Run the WriteToServer method of the BulkCopy object
Retrieve all the rows that have that same key
Iterate through this list which will be in the order they were added, these will be in the same order as the the in memory collection of rows so you then will know the generated id for each item.
This will give you better performance than giving each individual row a unique key. So after you bulk insert the data table you could do something like this (In my example I will have a list of objects from which I will create the data table and then map the generated ids back to them)
List<myObject> myCollection = new List<myObject>
Guid identifierKey = Guid.NewGuid();
//Do your bulk insert where all the rows inserted have the identifierKey
//set on the new column. In this example you would create a data table based
//off the myCollection object.
//Identifier is a column specifically for matching a group of rows to a sql
//bulk copy command
var myAddedRows = myDbContext.DatastoreRows.AsNoTracking()
.Where(d => d.Identifier == identiferKey)
.ToList();
for (int i = 0; i < myAddedRows.Count ; i++)
{
var savedRow = myAddedRows[i];
var inMemoryRow = myCollection[i];
int generatedId = savedRow.Id;
//Now you know the generatedId for the in memory object you could set a
// a property on it to store the value
inMemoryRow.GeneratedId = generatedId;
}
Related
I have a table called People with the following schema:
Id INT NOT NULL IDENTITY(1, 1)
FirstName NVARCHAR(64) NOT NULL
LastName NVARCHAR964) NOT NULL
I am using a query like this one to perform inserts and updates in one query:
MERGE INTO People AS TARGET
USING ( VALUES
(#id0, #firstName0, #lastname0),
(#id1, #firstName1, #lastname1)
...
) AS SOURCE ([Id],[FirstName],[LastName])
ON TARGET.[Id] = SOURCE.[Id]
WHEN MATCHED BY TARGET THEN
UPDATE SET
[FirstName] = SOURCE.[FirstName],
[LastName] = SOURCE.[LastName]
WHEN NOT MATCHED BY TARGET THEN
INSERT ([FirstName],[LastName])
VALUES ([FirstName],[LastName])
WHEN NOT MATCHED BY SOURCE THEN
DELETE
OUTPUT $action, INSERTED.*;
My application is structured such that the client calls back to the server to load the existing state of the app. The client then creates/modifies/deletes entities locally and pushes those changes to the server in one bunch.
Here's an example of what my "SaveEntities" code currently looks like:
public void SavePeople(IEnumerable<Person> people)
{
// Returns the query I mentioned above
var query = GetMergeStatement(people);
using(var command = new SqlCommand(query))
{
using(var reader = command.ExecuteReader())
{
while(reader.Read())
{
// how do I tie these records back to
// the objects in the people collection?
}
}
}
}
I can use the value in the $action column to filter down to just INSERTED records. INSERTED.* returns all of the columns in TARGET for the inserted record. The problem is I have no way of distinctly linking those results back to the collection of objects passed into this method.
The only solution I could think of was to add a writable GUID column to the table and allow the MERGE statement to specify that value so I could link back to these objects in code using that and assign the ID value from there, but that seems like it defeats the purpose of having an automatic identity column and feels convoluted.
I'm really curious how this can work because I know Entity Framework does something to mitigate this problem (to be clear, I believe I'd have to same problem were I to be using a pure INSERT statement instead of a MERGE). In EF I can add objects to the model and call Entity.SaveChanges() and have the entity's ID property auto-update using magic. I guess it's that kind of magic I'm looking to understand more.
Also, I know I could structure my saves to insert one record at a time and cascade the changes appropriately (by returning SCOPE_IDENTITY for every insert) but this would be terribly inefficient.
One of the things I love about the MERGE statement is that the source data is in scope in the OUTPUT clause.
OUTPUT $action, SOURCE.Id, INSERTED.Id;
On insert, this will give you three columns: 'INSERT' in the first, the values of #id0 and #id1 in the second, and the matching, newly inserted Id values in the third.
In your C# code, just read the rows as you normally would.
while (reader.Read())
{
string action = reader.GetString(0);
if (action == "INSERT")
{
int oldId = reader.GetInt32(1);
int newId = reader.GetInt32(2);
// Now do what you want with them.
}
}
You can check for "DELETE" and "UPDATE" too, but keep in mind that ordinal 2 will be NULL on "DELETE" so you need to make sure you check for that before calling reader.GetInt32 in that case.
I've used this, in combination with table variables (OUTPUT SOURCE.Id, INSERTED.Id INTO #PersonMap ([OldId], [NewId])), to copy hierarchies 4 and 5 tables deep, all with identity columns.
I used SQLite before, and adding multiple rows using Insert in a for loop was slow. The solution was using a transaction.
Now that I am using SQLiteAsyncConnection in SQLite.Net (for ORM), I also tried to use a transaction. It works but with only one problem. The insert order is not the order of the data.
Database.RunInTransactionAsync(
(SQLiteConnection conn) => {
foreach (var row in rows)
{
conn.InsertOrReplace(row);
}
conn.Commit();
}
);
If rows contained [1,2,3,4,5,6], the rows in the database was something like [3,1,2,6,4,5]. How can I keep the original order?
Note that I only mean newly inserted rows. Even thought the code is replacing existing rows, when testing there were no existing rows in the database to be replaced.
PS: The row has ID field which is the [PrimaryKey], but in the rows the rows are not sorted by ID. It seems that in the database the rows are sorted by ID. I do not want it to be sorted by ID but the original order to be maintained.
PS 2: I need to know the ID of the last-inserted row. When viewing the database using a GUI tool like DB Browser for SQLite or getting the last item by LIMIT 1, it seems the SQLite had automatically sorted the rows by ID. I did some Google search and it said by the rules of SQL, when there is no ORDER BY, the order of the returned rows are not guaranteed to be the physical order, anyway. Should I create another field and set it as the primary, auto-increasing field?
Currently, ID is guaranteed to be unique per row, but 'ID' is part of the data itself, not a field specially added for the use with the database.
SQL tables are logically unordered, so if you want a certain order, you always have to use ORDER BY in your queries.
If your data does not contain any values (e.g., timestamp) that corresponds to the insertion order, then you have to use the rowid, i.e., add a column declared as INTEGER PRIMARY KEY.
I'm wanting to do a bulk copy of data from one database to another. It needs to be dynamic enough so that when the users of the source database create new fields, there are minimal changes at the destination end(my end!).
I've done this using the sqlbulkcopy function, using column mappings set up in a seperate table, so that if anything new is created all I need to do is create the new field and set up the mapping (no code or stored procedure changes):
foreach (var mapping in columnMapping)
{
var split = mapping.Split(new[] { ',' });
sbc.ColumnMappings.Add(split.First(), split.Last());
}
try
{
sbc.WriteToServer(sourcedatatable);
}
However, the requirements have now changed.
I need to keep more data, sourced from elsewhere, in other columns in this table which means I can't truncate the whole table and write everything with the sqlbulkcopy. Now, I need to be able to Insert new records or Update the relevant fields for current records, but still be dynamic enough that I won't need code changes if the users create new fields.
Does anyone have any ideas?
Comment on original question from mdisibio - it looks like the SQL MERGE statement would have been the answer.
I have a SQL Server Database with several tables. One of them has "ID" (primary key), "Name" and other columns that i won't mention here for sake of simplicity. "ID" column is auto increment, unique and when i add some row using "SQL Server management studio", "ID" column increments properly. Database is old and current auto increment is at 1244 or so.
Now, i have created a C# project that uses TYPED Dataset to work with data from database. My database starts empty, dataset is filled using table adapters, new rows are added using my program but there's a problem i have never stumbled upon so far: when my program adds new row to Dataset, then updates database (using table adapter), "ID" column in my database gets correct auto-incremented number (1245,1246 etc), BUT my "ID" column in dataset gets "-1", "-2" instead! What's the problem? How can i tell my dataset to use auto-increment seed specified by database instead generating it's own NEGATIVE (???) primary key numbers?
EDIT:
I get and compare rows using this:
dsNames.tbNamesRow[] TMP = basedataset.tbNames.Select() as dsNames.tbNamesRow[];
foreach (dsNames.tbNamesRow row in TMP)
{
string Name = row.Name;
bool Found = Name == Search;
if (CompareDelegate != null)
Found = CompareDelegate(Name, Search);
if (Found)
{
int ID = row.ID;
break;
}
}
My original comment was kind of incorrect, I assumed you were retrieving the value from the database and THAT dataset had incorrect values in it.
The way ADO.NET deals with preventing collisions with it's disconnected dataset, it assigns negative IDENTITY column values, because it wouldn't know a possible positive number that IS NOT a collision as it's disconnected. These (negative) values are unique in terms of that transaction.
When you try and commit your changes, the ADO.NET engine determines the proper SQL to produce the correct result.
We can add one more column in the iC_ProductImageAssociation table called 'ProductFeatureApplicabilityId'
this column will refer to the iC_ProductFeatureApplicability. So when a product suppose ABC with ProductFeature of Color 'RED' is inserted in the iC_ProductFeatureApplicability we can take this ProductFeatureApplicabilityId and store in the iC_ProductImageAssociation table.
so now Image can be applied to a product or to a ProductFeature or Both. Also i am planning to produce a alternate ProductFeature Data Model.
in which rather than storing individual columns as a feature (like currently in iC_ProductFeature table, we are storing Color, Size , Brand ect as a separate columns ), we can create a master table of the Product Features (iC_ProductFeatureMasters) that will store all these columns as rows and at runtime administor can define more features
so iC_ProductFeatureMasters will store the data as
ProductFeatureMasterId FeatureName
1 Color
2 Size
3 Brand
4 Dimensions
and iC_ProductFeature Table will store the ProductFeatureMasterId and its value.
so now iC_ProductFeature will look like below
ProductFeatureId ProductFeatureMasterId Description UOM ID
1 1 RED
2 4 10 1
Here is the example from my code:
var table = new DataTable();
var sqlCopy = new SqlBulkCopy(dataBaseConnection, SqlBulkCopyOptions.Default, sqlTransaction) { DestinationTableName = destinationTableName};
sqlCopy.WriteToServer(table);
You can find more information by following the links below:
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx
http://www.sqlteam.com/article/use-sqlbulkcopy-to-quickly-load-data-from-your-client-to-sql-server
You cannot do BulkCopy for several tables so you need to BulkCopy for each of your tables.
For transactional behavior you should create Transaction object and pass in the constructors of BulkCopy objects.
Chances are, the triggers and other logic that needs to be executed with each row insert is what is keeping things slow, not the insert method. Even bulk copy will not be fast if it needs to execute triggers.
I'd recommend refactoring the logic to run on all of the rows after they have been inserted, rather than one at a time. Normally you'd create staging tables for the new data, where it would be stored while being processed and before being merged with the regular data tables.