so I have an old database that I'm migrating to a new one. The new one has a slightly different but mostly-compatible schema. Additionally, I want to renumber all tables from zero.
Currently I have been using a tool I wrote that manually retrieves the old record, inserts it into the new database, and updates a v2 ID field in the old database to show its corresponding ID location in the new database.
for example, I'm selecting from MV5.Posts and inserting into MV6.Posts. Upon the insert, I retrieve the ID of the new row in MV6.Posts and update it in the old MV5.Posts.MV6ID field.
Is there a way to do this UPDATE via INSERT INTO SELECT FROM so I don't have to process every record manually? I'm using SQL Server 2005, dev edition.
The key with migration is to do several things:
First, do not do anything without a current backup.
Second, if the keys will be changing, you need to store both the old and new in the new structure at least temporarily (Permanently if the key field is exposed to the users because they may be searching by it to get old records).
Next you need to have a thorough understanding of the relationships to child tables. If you change the key field all related tables must change as well. This is where having both old and new key stored comes in handy. If you forget to change any of them, the data will no longer be correct and will be useless. So this is a critical step.
Pick out some test cases of particularly complex data making sure to include one or more test cases for each related table. Store the existing values in work tables.
To start the migration you insert into the new table using a select from the old table. Depending on the amount of records, you may want to loop through batches (not one record at a time) to improve performance. If the new key is an identity, you simply put the value of the old key in its field and let the database create the new keys.
Then do the same with the related tables. Then use the old key value in the table to update the foreign key fields with something like:
Update t2
set fkfield = newkey
from table2 t2
join table1 t1 on t1.oldkey = t2.fkfield
Test your migration by running the test cases and comparing the data with what you stored from before the migration. It is utterly critical to thoroughly test migration data or you can't be sure the data is consistent with the old structure. Migration is a very complex action; it pays to take your time and do it very methodically and thoroughly.
Probably the simplest way would be to add a column on MV6.Posts for oldId, then insert all the records from the old table into the new table. Last, update the old table matching on oldId in the new table with something like:
UPDATE mv5.posts
SET newid = n.id
FROM mv5.posts o, mv6.posts n
WHERE o.id = n.oldid
You could clean up and drop the oldId column afterwards if you wanted to.
The best you can do that I know is with the output clause. Assuming you have SQL 2005 or 2008.
USE AdventureWorks;
GO
DECLARE #MyTableVar table( ScrapReasonID smallint,
Name varchar(50),
ModifiedDate datetime);
INSERT Production.ScrapReason
OUTPUT INSERTED.ScrapReasonID, INSERTED.Name, INSERTED.ModifiedDate
INTO #MyTableVar
VALUES (N'Operator error', GETDATE());
It still would require a second pass to update the original table; however, it might help make your logic simpler. Do you need to update the source table? You could just store the new id's in a third cross reference table.
Heh. I remember doing this in a migration.
Putting the old_id in the new table makes both the update easier -- you can just do an insert into newtable select ... from oldtable, -- and the subsequent "stitching" of records easier. In the "stitch" you'll either update child tables' foreign keys in the insert, by doing a subselect on the new parent (insert into newchild select ... (select id from new_parent where old_id = oldchild.fk) as fk, ... from oldchild) or you'll insert children and do a separate update to fix the foreign keys.
Doing it in one insert is faster; doing it in a separate step meas that your inserts aren't order dependent, and can be re-done if necessary.
After the migration, you can either drop the old_id columns, or, if you have a case where the legacy system exposed the ids and so users used the keys as data, you can keep them to allow use lookup based on the old_id.
Indeed, if you have the foreign keys correctly defined, you can use systables/information-schema to generate your insert statements.
Is there a way to do this UPDATE via INSERT INTO SELECT FROM so I don't have to process every record manually?
Since you wouldn't want to do it manually, but automatically, create a trigger on MV6.Posts so that UPDATE occurs on MV5.Posts automatically when you insert into MV6.Posts.
And your trigger might look something like,
create trigger trg_MV6Posts
on MV6.Posts
after insert
as
begin
set identity_insert MV5.Posts on
update MV5.Posts
set ID = I.ID
from inserted I
set identity_insert MV5.Posts off
end
AFAIK, you cannot update two different tables with a single sql statement
You can however use triggers to achieve what you want to do.
Make a column in MV6.Post.OldMV5Id
make a
insert into MV6.Post
select .. from MV5.Post
then make an update of MV5.Post.MV6ID
Related
I would like to know what is the best approach for creating a historical table for some table and automatically move deleted rows to this new table with same columns + deleted time.
For example:
When I delete a row from a PRODUCT table it will move to PRODUCT_H table with deleted time column.
Thank you for your time.
The best option is set trigger in database something like that :
CREATE TRIGGER movetohistorical
ON dbo.PRODUCT
FOR DELETE
AS
INSERT Product_H
SELECT * FROM dbo.PRODUCT
WHERE PRODUCT.id IN(SELECT deleted.id FROM deleted)
GO
The easisest way to do it is to implement a trigger in the database.
You can create the trigger using CREATE TRIGGER.
You don't have to worry about the trigger in your application code.
The trigger should be an AFTER DELETE trigger, which will execute whenever a row (or several rows) is (are) deleted.
You can read this article which implements nearly what you need (it implements a history table, but doesn't record the current datetime): SQL Server: Coding the After Delete Trigger in SQL Server. In fact, you have to make only a little change. In the sample, the insertion in the histroy table uses SELECT * FROM .... To do what you need, you simply have to add the GETDATE() function like this: SELECT *, GETDATE() FROM .... of course the destination table must have this date column at the end (if something goes wrong, simply specify the column names, instead of using the star).
Any other option will imply adding code to your application, and will require extra communication between your app and the SQL Server.
We are converting database primary keys from GUIDs to auto-incremented INTs. We have data that we parse from text files and put into two C# DataTables Claim and ClaimCharge that we have been using to bulk insert into identically named tables in the database. In the database, ClaimCharge.ClaimID is a foreign key to Claim.ID and several claim charges exist for one claim.
With GUIDs we generated the Claim and ClaimCharge IDs in C#, so bulk inserting was no problem. But with INTs, I don't know what the Claim.ID will be, so I can't assign ClaimCharge.ClaimID. I need some ideas on how this could be accomplished with INTs.
For instance, if the Claim table could be manually locked against inserts, I could:
Bulk insert into alternate tables named ClaimBulkData ClaimChargeBulkData. These tables would still use GUIDs for convenience in keeping the relationship maintained between C# and SQL.
Manually lock the Claim table against inserts (don't know if this is possible) and get the max(ID).
Increment all of the data in ClaimBulkData using MAX(ID).
Associate ClaimChargeBulkData to ClaimBulkData using the newly updated INT
Insert data into real Claim table as a set using IDENTITY_INSERT ON using some kind of exception to the imaginary lock created in step 2.
Release manually created lock against inserts on Claim table (again I don't know if this is possible.
Insert data into real ClaimCharge table.
I want to avoid inserting the data one row at a time in either C# or T-SQL.
Why not just add the new auto-increment column to the master tables -- you will then have both GUID and autoid column so you can fix up the foreign key relationship (one master table at a time)
i.e.,
Assume you have master1 and detail1 and detail1
alter table Master1 add ID int identity(1,1) not null
GO
alter Detail1 add master1ID int null
GO
alter Detail2 add master1ID int null
GO
Then update Detail1 and Detail12 based on joining Master1 on the oldguid key to set the corresponding value of Master1ID for each table
You can then add the foreign keys based on Master1ID to Detail and Detail2
At this point you should have a complete set of data based on both sets of keys, and you can test update views, etc. to make sure they work with the new integer ids
Finally, once all is cool, drop to unneeded GUID foreign key and the Guid columns themselves.
You can always run a database pack once you get everything clean and converted if your intent was to reduce overall disk usage via this restructuring. The point is much of the work is fixups for foreign keys in a process like this.
In my ASP.NET web app I'm trying to implement an import/export procedure to save or insert data in the application DB. My procedure generates some CSV files: one for each table.
Obviously there are relations between some of these tables and when I import CSV in my DB I'd like to maintain association between rows.
Say I have Table1 and Table2 with Table2 that has a foreign key to Table1. So I could have a row in Table1 with ID = 100 and a row in Table2 with Table1_ID = 100.
When I import CSV with Table1 data, new IDs are generated for Table1 rows, how can I maintain consistency of the foreign keys in Table2 when I import the corresponding CSV file?
I'm using Linq-to-SQL to retrieve data from DB... using DataSet and DataTable can help me?
NOTE I'd like to permit cumulative import, so when I import a CSV file there may already be data in the DB. So I cannot use 'Set Identity OFF'.
Add the items of Table1 first, so when you add the items of Table2 there are the corresponding records of Table1 already in the database. For more tables you will have figure out the order. If you are creating a system of arbitrary database schema, you will want to create a table graph (where each node is a table and each arc is a foreign key) in memory [There are no types for that in the base library] and then convert it to a tree such that you get the correct order by traversing the tree (breadth-first).
You can let the database handle the cases where there is a violation of the foreign key, because there is not such field. You will have to decide if you make a transaction of the whole import operation, or per item.
Although analisying the CSVs before hand is possible. To do that, you will want to store the values for the primary key of each table [Use a set for that] (again, iterate over the tables in the correct order), and then when you are reading a table that has a foreign key to a table that you have already read you can check if the key is there, also it will help you yo detect any possible duplicate. [If you have things already in the database to take into account, you would have to query too... although, take care if the database is in an active system where records could be deleted while you are still deciding if you can add the CSVs without problem].
To address that you are generating new IDs when you add...
The simplest solution that I can think of is: don't. In particular if it is an active system, where other requests are being processed, because then there is no way to predict the new IDs before hand. Your best bet would be to add them one by one, in that case, you will have to think your transaction strategy accordningly... it may be the case that you will not be able to roll back.
Although, I think your question is a bit deeper: If the ID of the Table1 did change, then how can I update the corresponding records in the Table2 so they point to the correct record in Table1?
To do that, I want to suggest to do the analysis as I described above, then you will have a group of sets that will works as indexes. This will help you locate the records that you need to update in Table2 for each ID in Table1. [It is also important to keep track if you have already updated a record, and don't do it twice, because it may happen the generated ID match an ID that is yet to be sent to the database].
To roll back, you can also use those sets, as they will end up having the new IDs that identify the records that you will have to pull out of the database if you want to abort the operation.
Edit: those sets (I recommend hashset) are only have the story, because they only have the primary key (for intance: ID in Table1). You will need bags to keep the foreing keys (in this case Table1_ID in Table2).
I'm stuck on a little problem concerning database.
Once a month I get a XML file with customer information (Name, address, city,etc.). My primary key is a customer number which is provided in the XML file.
I have no trouble inserting the information in the database;
var cmd = new SqlCommand("insert into [customer_info]
(customer_nr, firstname, lastname, address_1, address_2, address_3.......)");
//some code
cmd.ExecuteNonQuery();
Now, I would like to update my table or just fill it with new information. How can I achieve this?
I've tried using TableAdapter but it does not work.
And I'm only permitted to add one XML because I can only have one customer_nr as primary key.
So basically how do I update or fill my table with new information?
Thanks.
One way would be to bulk insert the data into a new staging table in the database (you could use SqlBulkCopy for this for optimal insert speed). Once it's in there, you could then index the customer_nr field and then run 2 statements:
-- UPDATE existing customers
UPDATE ci
SET ci.firstname = s.firstname,
ci.lastname = s.lastname,
... etc
FROM StagingTable s
INNER JOIN Customer_Info ci ON s.customer_nr = ci.customer_nr
-- INSERT new customers
INSERT Customer_Info (customer_nr, firstname, lastname, ....)
SELECT s.customer_nr, s.firstname, s.lastname, ....
FROM StagingTable s
LEFT JOIN Customer_Info ci ON s.customer_nr = ci.customer_nr
WHERE ci.customer_nr IS NULL
Finally, drop your staging table.
Alternatively, instead of the 2 statements, you could just use the MERGE statement if you are using SQL Server 2008 or later, which allows you to do INSERTs and UPDATEs via a single statement.
If I understand your question correctly - if the customer already exists you want to update their information, and if they don't already exist you want to insert a new row.
I have a lot of problems with hard-coded SQL commands in your code, so I would firstly be very tempted to refactor what you have done. However, to achieve what you want, you will need to execute a SELECT on the primary key, if it returns any results you should execute an UPDATE else you should execute an INSERT.
It would be best to do this in something like a Stored Procedure - you can pass the information to the stored procedure at then it can make a decision on whether to UPDATE or INSERT - this would also reduce the overhead of making several calls for your code to the database (A stored procedure would be much quicker)
AdaTheDev has indeed given the good suggestion.
But in case, you must insert/update from .NET code then you can
Create a stored procedure that will handle insert/update i.e. instead of using a direct insert query as command text, you make a call to stored proc. The SP will check if row exists or not and then update (or insert).
User TableAdapter - but this would be tedious. First you have to setup both insert & update commands. Then you have to query the database to get the existing customer numbers and then update the corresponding rows in the datatable making the Rowstate as Updated. I would rather not go this way.
A database exists with two tables
Data_t : DataID Primary Key that is
Identity 1,1. Also has another field
'LEFT' TINYINT
Data_Link_t : DataID PK and FK where
DataID MUST exist in Data_t. Also has another field 'RIGHT' SMALLINT
Coming from a microsoft access environment into C# and sql server I'm looking for a good method of importing a record into this relationship.
The record contains information that belongs on both sides of this join (Possibly inserting/updating upwards 5000 records at once). Bonus to process the entire batch in some kind of LINQ list type command but even if this is done record by record the key goal is that BOTH sides of this record should be processed in the same step.
There are countless approaches and I'm looking at too many to determine which way I should go so I thought faster to ask the general public. Is LINQ an option for inserting/updating a big list like this with LINQ to SQL? Should I go record by record? What approach should I use to add a record to normalized tables that when joined create the full record?
Sounds like a case where I'd write a small stored proc and call that from C# - e.g. as a function on my Linq-to-SQL data context object.
Something like:
CREATE PROCEDURE dbo.InsertData(#Left TINYINT, #Right SMALLINT)
AS BEGIN
DECLARE #DataID INT
INSERT INTO dbo.Data_t(Left) VALUES(#Left)
SELECT #DataID = SCOPE_IDENTITY();
INSERT INTO dbo.Data_Link_T(DataID, Right) VALUES(#DataID, #Right)
END
If you import that into your data context, you could call this something like:
using(YourDataContext ctx = new YourDataContext)
{
foreach(YourObjectType obj in YourListOfObjects)
{
ctx.InsertData(obj.Left, obj.Right)
}
}
and let the stored proc handle all the rest (all the details, like determining and using the IDENTITY from the first table in the second one) for you.
I have never tried it myself, but you might be able to do exactly what you are asking for by creating an updateable view and then inserting records into the view.
UPDATE
I just tried it, and it doesn't look like it will work.
Msg 4405, Level 16, State 1, Line 1
View or function 'Data_t_and_Data_Link_t' is not updatable because the modification affects multiple base tables.
I guess this is just one more thing for all the Relational Database Theory purists to hate about SQL Server.
ANOTHER UPDATE
Further research has found a way to do it. It can be done with a view and an "instead of" trigger.
create table Data_t
(
DataID int not null identity primary key,
[LEFT] tinyint,
)
GO
create table Data_Link_t
(
DataID int not null primary key foreign key references Data_T (DataID),
[RIGHT] smallint,
)
GO
create view Data_t_and_Data_Link_t
as
select
d.DataID,
d.[LEFT],
dl.[RIGHT]
from
Data_t d
inner join Data_Link_t dl on dl.DataID = d.DataID
GO
create trigger trgInsData_t_and_Data_Link_t on Data_t_and_Data_Link_T
instead of insert
as
insert into Data_t ([LEFT]) select [LEFT] from inserted
insert into Data_Link_t (DataID, [RIGHT]) select ##IDENTITY, [RIGHT] from inserted
go
insert into Data_t_and_Data_Link_t ([LEFT],[RIGHT]) values (1, 2)