How to copy performant data from one database to another database?

How to copy performant data from one database to another database? - c#

I have a conceptual question.
I have two databases which have the same structure. One database has already contained a lot of data. These data should be transferred to the other database via Select and Insert.
How can I do this data migration with the highest performance?
My first approach was to sort all the tables in a list where the tables which contain foreign keys will be stored behind the referenced tables. But with this solution it will be impossible to start parallel processing.
The second idea was to create a custom type which contains the tablename and the tablenames of the referenced tables and a bool flag which stores whether the data in the table have been copied. This type is stored for each table in a list. Then I start a new thread that checks before copying whether the referenced tables have already been created for each table. If not, I execute Thread.Sleep() after which I will check it again.
Is there a well performing approach to this problem?
Any suggestions will be helpful.
EDIT:
The old database is a SQL Base database.
The new database is a ms sql server database.

You may use either:
1) SQL Server Replication
Or
2) SQL Server Merge statement
You may use SQL Server Linked Servers to connect different database platforms (e.g. sql server, mysql, db2, ...)

Best advice : Stick with SQLBase.
Second best advice , use the SQLBase UNLOAD command via SQLTalk. This will write all DDL statements required to recreate the database else where - including Triggers , Stored Procs, Indexes etc. plus all the Data to load , if you use the right options, to an external file . This file can optionally then be edited programmatically if need be to be in a sql server format ( not much difference ) . There are many options to the UNLOAD command which can't be written here in detail , but here's a link to the syntax.
Note that SQLBase v12 has been released in recent months and performance has increased tenfold. With the right tuning and indexes etc . it will outstrip Sql Server in terms of efficiency and performance. On a 100Gb database our response times have gone from 50 seconds to 3 seconds with no additional work .

Related

Copying Data from Oracle Server to SQL Server

I'm quite new to coding in general and I'm looking to copy 47 columns with c300,000 rows of data, from an Oracle to an SQL database, on a daily basis. The code will be stored as a Windows Service, running at the same time every day (or more likely night).
The data from the Oracle DB table (let's call this the Oracle_Source) will be used to both append to a history table (call this SQL_History) and also to append new/update matching/delete missing rows from a live table (call this SQL_Live). The two types of databases are housed on different servers, but the two SQL tables are on the same DB.
I have a few questions around the best way to approach this.
Using VB/C#, is it faster to loop through rows (either 1 by 1 or batches of 100/1000/etc.) of Oracle_Source and insert/update SQL_History/SQL_Live OR copy the entire table of Oracle_Source in one go and insert into the SQL tables? Previously I have used the loop to download data into a .csv.
Using the more efficient of the above methods, would it be faster to work on both SQL tables simultaneously OR copy the data into the SQL_History table and then use that to APPEND/UPDATE/DELETE from the SQL_Live table?
Am I approaching this completely wrong?
Any other advice available is also much appreciated.

The correct question is “What is the fast way to copy the table?”
In your specific case , with 2 different server and a “big” table to copy, you are probably limited by network IO.
So, the first point is to update only the rows that must be update (Update/ Insert / Delete), so less byte to move.
To answer to your first point, you have to use transaction to improve the speed on sql server during the writing phase. The dimension of transaction depend on differenct factor (db, machine, ...) but I usually make transaction with 500/1000 simple commands. In my personal experience, if you use INSERT with more rows, you can send 500 rows for INSERT without performance issue.
In my experience, a bulk copy is faster than an efficient INSERT, UPDATE and DELETE because the db does not calculate key and does not check duplicate rows.
Better explained:
you TRUNCATE all data
DISABLE keys
massive INSERT of all rows and
re-ENABLE keys.
This is the faster way to copy a table but if your communication is from different server with low network speed this can't be the best choice.
Obviously, what is the best choice depend from your infrastructure and the table dimension
For example:
If you have one server your lan and the second server on clouds, the bottleneck is on the speed of internet connection and you must pay more attention to have an efficient communication(less byte).
If both servers are on your lan with two gigabit connection, probably the full network communication are around 100mb, and you can use a simple move all the table rows without headache.

Copy data between two SQL Server databases in C#

I have two databases. One of them belongs to a CRM software and is the source.
The other one will be the destination used by a tool I'm developing.
The destination will contain a table ADDRESSES with a subset of the columns of a table of the same name in the source database.
What is the best (most efficient) way to copy the data between those databases (btw: they're on different SQL Server instances if that's important).
I could write a loop which does INSERT into the destination for each row obtained from the source but I don't think that this is efficient.
My thoughts and information:
The data won't be altered on its way from source to destination
It will be altered on its way back
I don't have the complete structure of the source but I know which fields I need and that they're warranted to be in the source (hence, access to the rows obtained from source isn't possible using the index of columns)
I can't use LINQ.
Anything leading me in the right direction here is appreciated.
Edit:
I really need a C# way to copy the data. I also need to know how to merge the copied rows back to the source. Is it really necessary (or even best practise) to do this row after row?

Why write code to do this?
The single fastest and easiest way is just to use SQL Server's bcp.exe utility (bcp: Bulk Copy Program).
Export the data from the source server.
Zip it or tar it if it needs it.
FTP it over to where it needs to go, if you need to move it to another box.
Import it into the destination server.
You can accomplish the same thing via SQL Server Management Studio in a number of different ways. Once you've defined the task, it can be saved and it can be scheduled.
You can use SQL Server's Powershell objects to do this as well.
If you're set on doing it in C#:
write your select query to get the data you want from the source server.
execute that and populate a temp file with the output.
execute SQL Server's bulk insert statement against the destination server to insert the data.
Note: For any of these techniques, you'll need to deal with identity columns if the target table has them. You'll also need to deal with key collisions. It is sometimes easier to bulk load the data into a perma-temp table first, and then apply the prerequisite transforms and manipulations to get it to where it needs to go.

According to your comment on Jwrit's answer, you want two way syncs.
If so, you might want to look into Microsoft Sync Framework.
We use it to sync 200+ tables on Premise SQL to SQL Azure and SQL Azure to SQL Azure.
You can use purely C#. However, it might offer a lot more than you want, or it might be over kill for a small project.
I'm just saying so that you can have different option for your project.

If these databases exist on two servers you can setup a link between the servers by executing sp_addlinkedserver there are instructions for setting this up here. This may come in handy if you plan on regularly "Sharing" data.
http://msdn.microsoft.com/en-us/library/ff772782.aspx
Once the servers are linked a simple select statement can copy the rows from one table to another
INSERT INTO db1.tblA( Field1,Field2,Field2 )
SELECT Field1,Field2,Field2 FROM db2.tblB
If the Databases are on the same instance you only need to execute similar SQL to the above

If this is one time - the best bet is normally SSIS (SQL server integration services), unless there are complex data transformations - you can quickly and easily do column mappings and have it done (reliably) in 15 mins flat......

Data transfer from One database to another

I am looking for an idea or some direction. I have to transfer data from one database to another both are structurally and schema wise same.
Its a complete database with maybe 70 tables and having relationship between tables at different levels. Even though i ' m going to mess up the identity when i move across database but as of now i am ok with it.
Idea which i thought was to load required data from all table into an XML and then create connection to second database and push data from this XML its kind of repeated and not best way at all. So looking for direction.
Can i use entity framework for this somehow??
I cannot use SSIS for this it has to be C# Sorry.

You can create a linked server as stated in the comments to your question. You seemed to indicate that you know how to do this, but in case not, you can do this from SQL Server Management Studio by drilling down to "Server Objects > Linked Servers" beneath your source database on the source database server, then right-click, "New Linked Server", etc.
Then you would use a statement like this, for example, from your C# code:
insert into DestServer.DBName.dbo.TableName
select * from SourceServer.DBName.dbo.TableName
Assuming you are connected to 'SourceServer' and that 'SourceServer' maintains a linked server object pointing to 'DestServer'. Note: you don't actually need to use the fully-qualified name for the table on 'SourceServer', but I've put it there for clarity. I.E. you could also do this:
insert into DestServer.DBName.dbo.TableName
select * from TableName
Don't forget to setup the permissions properly in your linked server object so that your query can write to the table in the destination server. You can do this any number of ways, and often (because I work in a small environment where it's maintained by just me and a couple other folks) I just use the "sa" login:

Yes, can use linked servers in .NET.
You just use the 4 part name.
What you can do in TSQL in SSMS you can do in a .NET SQLcommand.
My experience is that you get better performance connecting to the server you are writing to.

C# and SQL Server 2008 - Batch Update

I have a requirement where I need to update thousands of records in live database table, and although there are many columns in this table, I only need to update 2-3 columns.
Further I can't hit database for thousand times just for updating which can be done in a batch update using SQL Server Table Valued Parameter. But again I shouldn't update all thousands records in one go for better error handling, instead want to update records in batches of x*100.
So, below is my approach, please give your valuable inputs for any other alternatives or any change in the proposed process -
1 Fetch required records from database to List<T> MainCollection
2 Save this collection to XML file with each element Status = Pending
3 Take first 'n' elements from XML file with Status = Pending and add them to new List<T> SubsetCollection
4 Loop over List<T> SubsetCollection - make required changes to T
5 Convert List<T> SubsetCollection to DataTable
6 Call Update Stored Procedure and pass above DataTable as TVP
7 Update Status = Processed for XML Elements corresponding to List<T> SubsetCollection
8 If more records with Pending status exists in XML file, go to Step# 3.
Please guide for a better approach or any enhancement in above process.

I would do a database-only approach if possible and if not possible, eliminate the parts that will be the slowest. If you are unable to do all the work in a stored procedure, then retrieve all the records and make changes.
The next step is to write the changes to a staging table with SQL Bulk Copy. This is a fast bulk loaded that will copy thousands of records in seconds. You will store the primary key and the columns to be updated as well as a batch number. The batch number is assigned to each batch of records, therefore allowing another batch to be loaded without conflicting with the first batch.
Use a stored procedure on the server to process the records in batches of 100 or 1000 depending on performance. Pass the batch number to the stored procedure.
We use such a method to load and update millions of records in batches. The best speed is obtained by eliminating the network and allowing the database server to handle the bulk of the work.
I hope this might provide you with an alternate solution to evaluate.

It may not be the best practice but you could embed some logic inside a SQL Server CLR function. This function could be called by a Query,StoProc or a schedule to run at a certain time.
The only issue I can see is getting step 4 to make the required changes on T. Embedding that logic into the database could be detrimental to maintenance, but this is no different to people who embed massive amounts of business logic into StoProcs.
Either way SQL Server CLR functions may be the way to go. You can create them in Visual Studio 2008, 2010 (Check the database new project types).
Tutorial : http://msdn.microsoft.com/en-us/library/w2kae45k(v=vs.80).aspx

Copying data from one oracle database to another oracle database using C#

What is the standard way of copying data from one oracle database to another.
1) Read data from source table and copy to temp table on destination using configuration( i.e. there are more than 1 table and each table has separate temp table)
2) Right now there is no clob data, but in future clob data might be used.
3) Read everything to memory(if large data read in chunks)
Should not use Oracle links
Should not use files
Code should be only using C# but not any database procedures.

One way that I've used to do this is to use a DataReader on the source database and just perform inserts on the target database (using Bind Parameters for sure).
Note that the DataReader is excellent at not using much memory as it moves through a table (I believe that by default it uses a Fast Forward, Read Only cursor). This means that only a small amount of data is held in memory at a given time.
Here are the things to watch out for:
Relationships
If you're working with data that has relationships, you're going to need to deal with that. There are two ways that I've seen to deal with this:
Temporarily drop the relationships in the target database before doing the copy, then recreate them after.
Copy the data in the correct order for the relationships to work correctly (this is usually pretty difficult / inefficient)
Auto Generated Id Values
These columns are usually handled by disabling the auto increment functionality for the given table and allowing identity insert (I'm using some SQL Server terms, I can't remember how it works on Oracle).
Transactions
If you're moving a lot of data, transactions will be expensive.
Repeatability / Deleting Target Data
Unless you're way more awesome than the rest of us, you'll probably have to run this thing more than once (at least during development). That means you might want a way to delete the target data.
Platform Specific Methods
In SQL Server, there are ways to perform bulk inserts that are blazingly fast (by giving up little things like referential integrity checking). There might be a similar feature within the Oracle toolset.
Table / Column Metadata
I haven't had to do this in Oracle yet, but it looks like you can get metadata on tables and columns using the views mentioned here.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.