Copying Data from Oracle Server to SQL Server - c#

I'm quite new to coding in general and I'm looking to copy 47 columns with c300,000 rows of data, from an Oracle to an SQL database, on a daily basis. The code will be stored as a Windows Service, running at the same time every day (or more likely night).
The data from the Oracle DB table (let's call this the Oracle_Source) will be used to both append to a history table (call this SQL_History) and also to append new/update matching/delete missing rows from a live table (call this SQL_Live). The two types of databases are housed on different servers, but the two SQL tables are on the same DB.
I have a few questions around the best way to approach this.
Using VB/C#, is it faster to loop through rows (either 1 by 1 or batches of 100/1000/etc.) of Oracle_Source and insert/update SQL_History/SQL_Live OR copy the entire table of Oracle_Source in one go and insert into the SQL tables? Previously I have used the loop to download data into a .csv.
Using the more efficient of the above methods, would it be faster to work on both SQL tables simultaneously OR copy the data into the SQL_History table and then use that to APPEND/UPDATE/DELETE from the SQL_Live table?
Am I approaching this completely wrong?
Any other advice available is also much appreciated.

The correct question is “What is the fast way to copy the table?”
In your specific case , with 2 different server and a “big” table to copy, you are probably limited by network IO.
So, the first point is to update only the rows that must be update (Update/ Insert / Delete), so less byte to move.
To answer to your first point, you have to use transaction to improve the speed on sql server during the writing phase. The dimension of transaction depend on differenct factor (db, machine, ...) but I usually make transaction with 500/1000 simple commands. In my personal experience, if you use INSERT with more rows, you can send 500 rows for INSERT without performance issue.
In my experience, a bulk copy is faster than an efficient INSERT, UPDATE and DELETE because the db does not calculate key and does not check duplicate rows.
Better explained:
you TRUNCATE all data
DISABLE keys
massive INSERT of all rows and
re-ENABLE keys.
This is the faster way to copy a table but if your communication is from different server with low network speed this can't be the best choice.
Obviously, what is the best choice depend from your infrastructure and the table dimension
For example:
If you have one server your lan and the second server on clouds, the bottleneck is on the speed of internet connection and you must pay more attention to have an efficient communication(less byte).
If both servers are on your lan with two gigabit connection, probably the full network communication are around 100mb, and you can use a simple move all the table rows without headache.

Related

How to copy performant data from one database to another database?

I have a conceptual question.
I have two databases which have the same structure. One database has already contained a lot of data. These data should be transferred to the other database via Select and Insert.
How can I do this data migration with the highest performance?
My first approach was to sort all the tables in a list where the tables which contain foreign keys will be stored behind the referenced tables. But with this solution it will be impossible to start parallel processing.
The second idea was to create a custom type which contains the tablename and the tablenames of the referenced tables and a bool flag which stores whether the data in the table have been copied. This type is stored for each table in a list. Then I start a new thread that checks before copying whether the referenced tables have already been created for each table. If not, I execute Thread.Sleep() after which I will check it again.
Is there a well performing approach to this problem?
Any suggestions will be helpful.
EDIT:
The old database is a SQL Base database.
The new database is a ms sql server database.
You may use either:
1) SQL Server Replication
Or
2) SQL Server Merge statement
You may use SQL Server Linked Servers to connect different database platforms (e.g. sql server, mysql, db2, ...)
Best advice : Stick with SQLBase.
Second best advice , use the SQLBase UNLOAD command via SQLTalk. This will write all DDL statements required to recreate the database else where - including Triggers , Stored Procs, Indexes etc. plus all the Data to load , if you use the right options, to an external file . This file can optionally then be edited programmatically if need be to be in a sql server format ( not much difference ) . There are many options to the UNLOAD command which can't be written here in detail , but here's a link to the syntax.
Note that SQLBase v12 has been released in recent months and performance has increased tenfold. With the right tuning and indexes etc . it will outstrip Sql Server in terms of efficiency and performance. On a 100Gb database our response times have gone from 50 seconds to 3 seconds with no additional work .

how to fast Insert 4,500,000 record into sql server database with c#

I have a program in c# that Insert 4,500,000 record into sql using ExecuteNoneQuery
and take too long to insert
I need a fast way to insert that take maximum about 10 minutes however when I insert 4,500,000 record from another table to my table via management studio it take 3 minutes
The SqlBulkCopy class is designed for fast insertion of large sets of data.
However, you need to understand that with that kind of size, disk access speed and network latency/bandwidth/saturation come into play.
Your example of populating one table from another is not valid in such a scenario as you are copying on the same machine.
As Oded has already stated, the starting point for this is SqlBulkCopy. However, if you have control over the database, you should also check that the Recovery Model on the database is set to Simply or "Bulk Logged". Without this you will take a heavy hit with SQL Server creating journal entries. You also need to ensure that SqlBulkCopyOptions of TableLock is set.
These two are straight forward. It can also be worth playing with the SqlBulCopy BatchSize setting, and on the transaction model (see UseInternalTransaction) but these settings are harder to give advice on, as the optimal settings can be quite different for different scenarions. If doing the TableLock and checking the Recovery Model doesn't quite get you the speed you want, then you can start playing with the Batch Size, but it is more of a grey area.

Sql server bulk insert/update vs MERGE in insert or update scenario

I need to find the best way to insert or update data in database using sql server and asp.net. It is a standard scenario if data exist it is updated if not it is inserted. I know that there are many topic here about that but no one has answered what i need to know.
So my problem is that there is really no problem when you update/insert 5k - 10k rows but what with 50k and more.
My first idea was to use sql server 2008 MERGE command, but i have some performance consideration if it will be 50k+ rows. Also i don't know if i can marge data this way based not on primary id key (int) but on other unique key in the table. (to be precise an product serial number that will not change in time).
My second idea was to first get all product serials, then compare the new data serials with that and divide it into data to insert and data to update, then just make one bulk insert and one bulk update.
I just don't know which will be better, with MERGE i don't know what the performance will be and it is supported only by sql server 2008, but it looks quite simple, the second option doesn't need sql 2008, the batches should be fast but selecting first all serials and dividing based on them could have some performance penalties.
What is you opinion, what to choose ?
Merge performace way better because "One of the most important advantage of MERGE statement is all the data is read and processed only once"
You dont need a primary key, you can join on one or more fields what makes your records unique
There should be no problem performing the merge on the serial number as you've described it. You may want to read Optimizing MERGE Statement Performance for Microsoft's recommended best practices when using MERGE.

How to write data in database efficiently using c#?

my windows app is reading text file and inserting it into the database. The problem is text file is extremely big (at least for our low-end machines). It has 100 thousands rows and it takes time to write it into the database.
Can you guys suggest how should i read and write the data efficiently so that it does not hog machine memory?
FYI...
Column delimiter : '|'
Row delimiter : NewLine
It has approximately 10 columns.. (It has an information of clients...like first name, last name, address, phones, emails etc.)
CONSIDER THAT...I AM RESTRICTED FROM USING BULK CMD.
You don't say what kind of database you're using, but if it is SQL Server, then you should look into the BULK INSERT command or the BCP utility.
Given that there is absolutely no chance of getting help from your security folks and using BULK commands, here is the approach I would take:
Make sure you are reading the entire text file first before inserting into the database. Thus reducing the I/O.
Check what indexes you have on the destination table. Can you insert into a temporary table with no indexes or dependencies so that the individual inserts are fast?
Does this data need to be visible immediately after insert? If not then you can have a scheduled job to read from the temp table in step 2 above and insert into the destination table (that has indexes, foreign keys etc.).
Is it possible for you to register your custom assembly into Sql Server? (I'm assuming it's sql server because you've already said you used bulk insert earlier).
Than you can call your assembly to do (mostly) whatever you need, like getting a file from some service (or whatever your option is), parsing and inserting directly into tables.
This is not an option I like, but it could be a saver sometimes.

Faster way to update 250k rows with SQL

I need to update about 250k rows on a table and each field to update will have a different value depending on the row itself (not calculated based on the row id or the key but externally).
I tried with a parametrized query but it turns out to be slow (I still can try with a table-value parameter, SqlDbType.Structured, in SQL Server 2008, but I'd like to have a general way to do it on several databases including MySql, Oracle and Firebird).
Making a huge concat of individual updates is also slow (BUT about 2 times faster than making thousands of individual calls (roundtrips!) using parametrized queries)
What about creating a temp table and running an update joining my table and the tmp one? Will it work faster?
How slow is "slow"?
The main problem with this is that it would create an enormous entry in the database's log file (in case there's a power failure half-way through the update, the database needs to log each action so that it can rollback in the event of failure). This is most likely where the "slowness" is coming from, more than anything else (though obviously with such a large number of rows, there are other ways to make the thing inefficient [e.g. doing one DB roundtrip per update would be unbearably slow], I'm just saying once you eliminate the obvious things, you'll still find it's pretty slow).
There's a few ways you can do it more efficiently. One would be to do the update in chunks, 1,000 rows at a time, say. That way, the database writes lots of small log entries, rather than one really huge one.
Another way would be to turn off - or turn "down" - the database's logging for the duration of the update. In SQL Server, for example, you can set the Recovery Model to "simple" or "bulk update" which would speed it up considerably (with the caveat that you are more at risk if there's a power failure or something during the update).
Edit Just to expand a little more, probably the most efficient way to actually execute the queries in the first place would be to do a BULK INSERT of all the new rows into a temporary table, and then do a single UPDATE of the existing table from that (or to do the UPDATE in chunks of 1,000 as I said above). Most of my answer was addressing the problem once you've implemented it like that: you'll still find it's pretty slow...
call a stored procedure if possible
If the columns updated are part of indexes you could
drop these indexes
do the update
re-create the indexes.
If you need these indexes to retrieve the data, well, it doesn't help.
You should use the SqlBulkCopy with the KeepIdentities flag set.
As part of a SqlTransaction do a query to SELECT all the records that need updating and then DELETE THEM, returning those selected (and now removed) records. Read them into C# in a single batch. Update the records on the C# side in memory, now that you've narrowed the selection and then SqlBulkCopy those updated records back, keys and all. And don't forget to commit the transaction. It's more work, but it's very fast.
Here's what I would do:
Retrieve the entire table, that is, the columns you need in order to calculate/retrieve/find/produce the changes externally
Calculate/produce those changes
Run a bulk insert to a temporary table, uploading the information you need server-side in order to do the changes. This would require the key information + new values for all the rows you intend to change.
Run SQL on the server to copy new values from the temporary table into the production table.
Pros:
Running the final step server-side is faster than running tons and tons of individual SQL, so you're going to lock the table in question for a shorter time
Bulk insert like this is fast
Cons:
Requires extra space in your database for the temporary table
Produces more log data, logging both the bulk insert and the changes to the production table
Here are things that can make your updates slow:
executing updates one by one through parametrized query
solution: do update in one statement
large transaction creates big log entry
see codeka's answer
updating indexes (RDBMS will update index after each row. If you change indexed column, it could be very costly on large table)
if you can, drop indices before update and recreate them after
updating field that has foreign key constraint - for each inserted record RDBMS will go and look for appropriate key
if you can, disable foreign key constraints before update and enable them after update
triggers and row level checks
if you can, disable triggers before update and enable them after

Categories