We can add one more column in the iC_ProductImageAssociation table called 'ProductFeatureApplicabilityId'
this column will refer to the iC_ProductFeatureApplicability. So when a product suppose ABC with ProductFeature of Color 'RED' is inserted in the iC_ProductFeatureApplicability we can take this ProductFeatureApplicabilityId and store in the iC_ProductImageAssociation table.
so now Image can be applied to a product or to a ProductFeature or Both. Also i am planning to produce a alternate ProductFeature Data Model.
in which rather than storing individual columns as a feature (like currently in iC_ProductFeature table, we are storing Color, Size , Brand ect as a separate columns ), we can create a master table of the Product Features (iC_ProductFeatureMasters) that will store all these columns as rows and at runtime administor can define more features
so iC_ProductFeatureMasters will store the data as
ProductFeatureMasterId FeatureName
1 Color
2 Size
3 Brand
4 Dimensions
and iC_ProductFeature Table will store the ProductFeatureMasterId and its value.
so now iC_ProductFeature will look like below
ProductFeatureId ProductFeatureMasterId Description UOM ID
1 1 RED
2 4 10 1
Here is the example from my code:
var table = new DataTable();
var sqlCopy = new SqlBulkCopy(dataBaseConnection, SqlBulkCopyOptions.Default, sqlTransaction) { DestinationTableName = destinationTableName};
sqlCopy.WriteToServer(table);
You can find more information by following the links below:
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx
http://www.sqlteam.com/article/use-sqlbulkcopy-to-quickly-load-data-from-your-client-to-sql-server
You cannot do BulkCopy for several tables so you need to BulkCopy for each of your tables.
For transactional behavior you should create Transaction object and pass in the constructors of BulkCopy objects.
Chances are, the triggers and other logic that needs to be executed with each row insert is what is keeping things slow, not the insert method. Even bulk copy will not be fast if it needs to execute triggers.
I'd recommend refactoring the logic to run on all of the rows after they have been inserted, rather than one at a time. Normally you'd create staging tables for the new data, where it would be stored while being processed and before being merged with the regular data tables.
Related
I'm wanting to do a bulk copy of data from one database to another. It needs to be dynamic enough so that when the users of the source database create new fields, there are minimal changes at the destination end(my end!).
I've done this using the sqlbulkcopy function, using column mappings set up in a seperate table, so that if anything new is created all I need to do is create the new field and set up the mapping (no code or stored procedure changes):
foreach (var mapping in columnMapping)
{
var split = mapping.Split(new[] { ',' });
sbc.ColumnMappings.Add(split.First(), split.Last());
}
try
{
sbc.WriteToServer(sourcedatatable);
}
However, the requirements have now changed.
I need to keep more data, sourced from elsewhere, in other columns in this table which means I can't truncate the whole table and write everything with the sqlbulkcopy. Now, I need to be able to Insert new records or Update the relevant fields for current records, but still be dynamic enough that I won't need code changes if the users create new fields.
Does anyone have any ideas?
Comment on original question from mdisibio - it looks like the SQL MERGE statement would have been the answer.
I know I can do a bulk insert into my table with an identity column by not specifying the SqlBulkCopyOptions.KeepIdentity as mentioned here.
What I would like to be able to do is get the identity values that the server generates and put them in my datatable, or even a list. I saw this post, but I want my code to be general, and I can't have a version column in all my tables. Any suggestions are much appreciated. Here is my code:
public void BulkInsert(DataTable dataTable, string DestinationTbl, int batchSize)
{
// Get the DataTable
DataTable dtInsertRows = dataTable;
using (SqlBulkCopy sbc = new SqlBulkCopy(sConnectStr))
{
sbc.DestinationTableName = DestinationTbl;
// Number of records to be processed in one go
sbc.BatchSize = batchSize;
// Add your column mappings here
foreach (DataColumn dCol in dtInsertRows.Columns)
{
sbc.ColumnMappings.Add(dCol.ColumnName, dCol.ColumnName);
}
// Finally write to server
sbc.WriteToServer(dtInsertRows);
}
}
AFAIK, you can't.
The only way (that I know of) to get the values(s) of the identity field is by using either SCOPE_IDENTITY() when you insert row-by-row; or by using the OUTPUT approach when inserting an entire set.
The 'simplest' approach probably would be that you would SqlBulkCopy the records in the table and then fetch them back again later on. The problem might be that it could be hard to properly (and quickly) fetch those rows from the server again. (e.g. it would be rather ugly (and slow) to have a WHERE clause with IN (guid1, guid2, .., guid999998, guid999999) =)
I'm assuming performance is an issue here as you're already using SqlBulkCopy so I'd suggest to go for the OUTPUT approach in which case you'll firstly need a staging table to SqlBulkCopy your records in. Said table should then be including some kind of batch-identifier (GUID?) as to allow multiple treads to run side by side. You'll need a stored procedure to INSERT <table> OUTPUT inserted.* SELECT the data from the staging-table into the actual destination table and also clean-up the staging table again. The returend recordset from said procedure would then match 1:1 to the origanal dataset responsible for filling the staging table, but off course you should NOT rely on it's order. In other words : your next challenge than will be matching the returned Identity-fields back to the original records in your application.
Thinking things over, I'd say that in all cases -- except the row-by-row & SCOPY_IDENTITY() approach, which is going to be dog-slow -- you'll need to have (or add) a 'key' to your data to link the generated id's back to the original data =/
You can do a similar approach described above by deroby but instead of retrieving them back via a WHERE IN (guid1, etc... You match them back up to the rows inserted in memory based on their order.
So I would suggest to add a column onto the table to match the row to a SqlBulkCopy transaction and then do the following to match the generated Ids back to the in memory collection of rows you just inserted.
Create a new Guid and set this value on all the rows in the bulk copy mapping to the new column
Run the WriteToServer method of the BulkCopy object
Retrieve all the rows that have that same key
Iterate through this list which will be in the order they were added, these will be in the same order as the the in memory collection of rows so you then will know the generated id for each item.
This will give you better performance than giving each individual row a unique key. So after you bulk insert the data table you could do something like this (In my example I will have a list of objects from which I will create the data table and then map the generated ids back to them)
List<myObject> myCollection = new List<myObject>
Guid identifierKey = Guid.NewGuid();
//Do your bulk insert where all the rows inserted have the identifierKey
//set on the new column. In this example you would create a data table based
//off the myCollection object.
//Identifier is a column specifically for matching a group of rows to a sql
//bulk copy command
var myAddedRows = myDbContext.DatastoreRows.AsNoTracking()
.Where(d => d.Identifier == identiferKey)
.ToList();
for (int i = 0; i < myAddedRows.Count ; i++)
{
var savedRow = myAddedRows[i];
var inMemoryRow = myCollection[i];
int generatedId = savedRow.Id;
//Now you know the generatedId for the in memory object you could set a
// a property on it to store the value
inMemoryRow.GeneratedId = generatedId;
}
Is it possible to store data in sets or batches of 12 in sql? I have a query which before inserting a new row should check if the existing records are equal or less than twelve. If they are twelve then it should create a new batch which also stores a maximum of twelve records.
There's no built-in concept of data batching in SQL Server, or any other SQL-like database out there. What you're describing is data grouping/relations, which is the responsibility of the database designer to figure out. What you should do is create a related table, called Batches. Give it a primary key. Then place that primary key column into your actual table with data as a foreign key. The logic about when batches are created should probably be defined in a trigger, or define the logic in a UDF and set the Default Value of that BatchId column to the result of that UDF.
I have used BulkCopy command to transfer rows from one table to another table with bulk data about 3 to 5 million rows. I want to update these rows.
Is there any BulkUpdate command similar to the BulkCopy command? I'm using ASP.NET with C#.
No, there isn't.
Q: What's an "lac"?
This might help:
http://itknowledgeexchange.techtarget.com/itanswers/bulk-update-in-sql-server-2005/
Assuming that you have a column with distict values to show you which
rows are which between the two tables this can be done with a simple
update statement.
UPDATE TableA
SET TableA.A1 = TableB.B1,
TableA.A2 = TableB.B2
FROM TableB
WHERE TableA.A3 = TableB.B3
If you are worried about creating one massive transaction you can
batch the operation into smaller chunks. This is done via the TOP
keyword.
UPDATE TOP (1000) TableA
SET TableA.A1 = TableB.B1,
TableA.A2 = TableB.B2
FROM TableB
WHERE TableA.A3 = TableB.B3
AND TableA.A1 <> TableB.B1
AND TableA.A2 <> TableB.B2
You can put that into a loop...
Here's another link (with basically the same solution):
http://www.sqlusa.com/bestpractices2005/hugeupdate/
A common approach here is:
bulk-load (SqlBulkCopy) into an empty *staging table - meaning: a table with the right columns/types as the actual data, but not part of the main transactional system
now do an update joining the real data to the staging data, to update the values in the real data
Disclaimer: I'm the owner of the project Bulk Operations
The Bulk Operations Library allow to Insert, Delete, Update and Merge millions of rows in few seconds.
It's very easy to learn and use if you already know the SqlBulkCopy class.
var bulk = new BulkOperation(connection);
// ... Mappings ....
bulk.BulkUpdate(dt);
I have a DataSet with two TableAdapters (1 to many relationship) that was created using visual studio 2010's Configuration Wizard.
I make a call to an external source and populate a Dictionary with the results. These results should be all of the entries in the database. To synchronize the DB I don't want to just clear all of the tables and then repopulate them like dropping the tables and creating them with new data in sql.
Is there a clean way possibly using the TableAdapter.Fill() method or do I have to loop through the two tables row by row and decide if it stay or gets deleted and then add the new entries? What is the best approach to make the data that is in the dictionary be the only data in my two tables with the DataSet?
First Question: if it's the same DB why do you have 2 tables with the same information?
To the question at hand: that largley depend on the sizes. If the tables are not big then use a transaction, clear the table (DELETE * FROM TABLE or whatever) and write your data in there again.
If the tables are big on the other hand the question is: can you load all this into your dictionary?
Of course you have to ask yourself what happens to inconsistent data (another user/app changed the data while you had it in your dictionary).
If this takes to long you could remember what you did to the data - that means: flag the changed data and remember the deleted keys and new inserted rows and make your updates based on that.
Both can be achieved by remembering the Filled DataTable and use this as backing field or by implementing your own mechanisms.
In any way I would recommend think on the problem: do you really need the dictionary? Why not make queries against the database to get the data? Or only cache a part of the data for quick access?
PS: the update method on you DataAdapter will do all the work (changing the changed, removing the deleted and inserting the new datarows but it will update the DataTable/Set so this will only work once)
It could be that it is quicker to repopulate the entire table than to itterate through and decide what record go / stay. Could you not do the process of deciding if a records is deleteed via an sql statement ? (Delete from table where active = false) if you want them to stay in the database but not in the dataset (select * from table where active = true)
You could have a date field and select all records that have been added since the date you late 'pooled' the database (select * from table where active = true and date-added > #12:30#)