My scenario is that I have [mostly] matching tables in 2 different databases. The table schema will usually be identical, but sometimes will change (column additions, renames, removals). I need to copy data from Source to Destination under the following conditions:
Attempt to insert all rows from source to destination. No updates.
Unique constraints will cause inserts to fail for dups - this is OK
If columns don't quite match up between source and dest, insert must still succeed. Yes, this is the tricky part. Wherever columns match, need insert according to matching schema. Any columns that are new/removed/renamed can be skipped and ignored, but overall insert must succeed.
Source may be Access or MSSQL and Dest may be Access or MSSQL, so any sql must work for both.
This is not a one-time thing. It's part of a software/data upgrade process that will occur over and over for many customers, with different datasets and different tables. But again, most columns will always match (table names will always be same on both sides, with similar schema), with occasional column differences and unique constraints.
It's OK for an insert to fail due to a unique constraint violation (this means there's a duplicate record).
It's NOT OK for an insert to fail due to column mismatches -- I need to find a way for these to match up best as possible. Unmatching columns can be skipped/ignored.
Unfortunately the table schemas weren't designed very well and thus tables do not have an int primary key. Many tables have multi-column keys, but these can't be used because ANY table in the source may need to be copied to ANY table in the destination. So reliance on keys won't work.
I'm using Visual Studio 2015 and latest c#. SQL Server and Access.
This is a strange scenario, but I need a robust way to handle it. I don't care if it's an ugly hack, it just needs to work. Can anyone think of a good approach for this?
I did project couple of years ago i had the same challenge.In my case there were multiple remote stores that they were offline during the month (no internet connection) and each 2 weeks they need to connect to internet and sync the data between their local database and HQ database.
Basically first they have to send their local data to HQ and then they need to receive HQ changes back.
If you dont have large data (in my case the database had more than 600 tables and around 60 million rows) you can first send the client data to server merge with existing data then you can drop the client table and batch insert fresh data from server.
Related
Evening all,
Background: I have many xml data files which I need to import in to an SQL database via my C# WPF based Windows Application. Whilst some data files are a simple and straight forward INSERT, many require validation and verification checks, require GUIDs from existing records (from within the db) before the INSERT takes place in other to maintain certain relationships.
So I've broken the process in to three stages:-
Validate records. Many checks exist such as e.g. 50,000 accounts in xml file must have a matching reference with an associated record already in the database. If there is no corresponding account, abandon the entire import process; Another would be, only the 'Current' record can be updated, so again, if it points to a 'historic' record then it needs to crash and burn.
UPDATE the associated database records e.g Set from 'Current' to 'Historic' as record is to be superseded by what's to come next...
INSERT the records across multiple tables. e.g. 50,000 accounts to be inserted.
So, in summary, I need to validate records before any changes can be made to the database with a few validation checks. At which point I will change various tables for existing records before inserting the 'latest' records.
My first attempt to resolve this situation was to load the xml file in to XmlDocument or XDocument instance, iterate over every account in the xml file and perform an sql command for every account. Verify it exists, verify its a current account, change the record before inserting it. Rinse and repeat for thousands of records - Not ideal to say the least.
So my second attempt is to load the xml file in to a data table. Export the corresponding accounts from the database in to another data table and perform a nested loop validation e.g. does DT1.AccountID exist in DT2.AccountID, move DT2.GUID to DT1.GUID etc etc. I appreciate this could also be a slow process. That said, I do have the luxury then of performing both the UPDATE and INSERT stored procedures with a table value parameter (TVP) and making use of the data table information.
I appreciate many will suggest letting the SQL do all of the work but i'm lacking in that skillset unfortunately (happy to learn if thats the general consensus) but I would much rather do the work in C# code if at all possible.
Any views on this are greatly appreciated. Many of the questions I've found are around bulk INSERT, not so much about validating existing records, following by updating records followed by inserting records. I suppose my question is around the first part, the validation. Does extracting the data from the db in to a data table to work on seem wrong, old fashioned, pointless?
I'm sure i've missed out some vital piece of information so apologies if unclear.
Cheers
so we are facing this problems and we are trying to solve them as efficiently as possible:
import data(130+k rows) from excel file
load it in given table on Sql Server Database
Disperse(normalize) the data from that table into 6 others
So far we've managed to cope with the first 2 problems.
We access the data from the file with the help of Excel Data Reader library, bind and bulk insert the loaded data
with EntityFramework.BulkInsert. So far the process is taking less than 2 min, for little over 130k rows, which is acceptable.
The big problem comes when we try to normalize the data in the table loaded on the server.
The Database contains previous information and must be operational during changes => no table locking.
Table has 130 columns and contains information dispersed into 6 tables. The main record keeps foreign keys of the other 5.
Other records might or might not be already present in the database. If they are present missing information is updated.
The tricky part for us is knowing the ID of the sub entities at the time of the insertion of the main entity. Insertion row by row would solve the problem but it would take too much time.
We would like to use some kind of bulk insert, but we're not sure how to keep the connection between the records.
The database design is legacy and reorganising it is not an option. :)
Any ideas for tool, approach or advice on the matter is welcomed :)
Thanks in advance.
I have a conceptual question.
I have two databases which have the same structure. One database has already contained a lot of data. These data should be transferred to the other database via Select and Insert.
How can I do this data migration with the highest performance?
My first approach was to sort all the tables in a list where the tables which contain foreign keys will be stored behind the referenced tables. But with this solution it will be impossible to start parallel processing.
The second idea was to create a custom type which contains the tablename and the tablenames of the referenced tables and a bool flag which stores whether the data in the table have been copied. This type is stored for each table in a list. Then I start a new thread that checks before copying whether the referenced tables have already been created for each table. If not, I execute Thread.Sleep() after which I will check it again.
Is there a well performing approach to this problem?
Any suggestions will be helpful.
EDIT:
The old database is a SQL Base database.
The new database is a ms sql server database.
You may use either:
1) SQL Server Replication
Or
2) SQL Server Merge statement
You may use SQL Server Linked Servers to connect different database platforms (e.g. sql server, mysql, db2, ...)
Best advice : Stick with SQLBase.
Second best advice , use the SQLBase UNLOAD command via SQLTalk. This will write all DDL statements required to recreate the database else where - including Triggers , Stored Procs, Indexes etc. plus all the Data to load , if you use the right options, to an external file . This file can optionally then be edited programmatically if need be to be in a sql server format ( not much difference ) . There are many options to the UNLOAD command which can't be written here in detail , but here's a link to the syntax.
Note that SQLBase v12 has been released in recent months and performance has increased tenfold. With the right tuning and indexes etc . it will outstrip Sql Server in terms of efficiency and performance. On a 100Gb database our response times have gone from 50 seconds to 3 seconds with no additional work .
What is the standard way of copying data from one oracle database to another.
1) Read data from source table and copy to temp table on destination using configuration( i.e. there are more than 1 table and each table has separate temp table)
2) Right now there is no clob data, but in future clob data might be used.
3) Read everything to memory(if large data read in chunks)
Should not use Oracle links
Should not use files
Code should be only using C# but not any database procedures.
One way that I've used to do this is to use a DataReader on the source database and just perform inserts on the target database (using Bind Parameters for sure).
Note that the DataReader is excellent at not using much memory as it moves through a table (I believe that by default it uses a Fast Forward, Read Only cursor). This means that only a small amount of data is held in memory at a given time.
Here are the things to watch out for:
Relationships
If you're working with data that has relationships, you're going to need to deal with that. There are two ways that I've seen to deal with this:
Temporarily drop the relationships in the target database before doing the copy, then recreate them after.
Copy the data in the correct order for the relationships to work correctly (this is usually pretty difficult / inefficient)
Auto Generated Id Values
These columns are usually handled by disabling the auto increment functionality for the given table and allowing identity insert (I'm using some SQL Server terms, I can't remember how it works on Oracle).
Transactions
If you're moving a lot of data, transactions will be expensive.
Repeatability / Deleting Target Data
Unless you're way more awesome than the rest of us, you'll probably have to run this thing more than once (at least during development). That means you might want a way to delete the target data.
Platform Specific Methods
In SQL Server, there are ways to perform bulk inserts that are blazingly fast (by giving up little things like referential integrity checking). There might be a similar feature within the Oracle toolset.
Table / Column Metadata
I haven't had to do this in Oracle yet, but it looks like you can get metadata on tables and columns using the views mentioned here.
I'm building an application to import data into a sql server 2008 Express db.
This database is being used by an application that is currently in production.
The data that needs to be imported comes from various sources, mostly excel sheets and xml files.
The database has the following tables:
tools
powertools
strikingtools
owners
Each row, or xml tag in the source files has information about 1 tool:
name, tooltype, weight, wattage, owner, material, etc...
Each of these rows has the name of the tool's owner this name has to be inserted into the owners table but only if the name isn't already in there.
For each of these rows a new row needs to be inserted in the tools table.
The tools table has a field owner_id with a foreign key to the owners table where the primary key of the corresponding row in the owners table needs to be set
Depending on the tooltype a new row must be created in either the powertools table or the strikingtools table. These 2 tables also have a tool_id field with a foreign key to the tools table that must be filled in.
The tools table has a tool_owner_id field with a foreign key to the owners table that must be filled in.
If any of the rows in the importfile fails to import for some reason, the entire import needs to be rolled back
Currently I'm using a dataset to do this but for some large files (over 200.000 tools) this requires quite a lot of memory. Can anybody think of a better aproach for this?
There are two main issues to be solved:
Parsing the a large XML document efficiently.
Adding a large amount of records to the database.
XML Parsing
Although the DataSet approach works, the whole XML document is loaded into memory. To improve the efficiency of working with large XML documents you might want look at the XmlReader class. The API is slightly more difficult to use than what DataSet provides. But you will get the benefit of not loading the whole DOM into memory at once.
Inserting records to the DB
To satisfy your Atomicity requirement you can use a single database transaction but the large number of records you are dealing with for a single transaction is not ideal. You will most likely incur issues like:
Database having to deal with a large number of locks
Database locks that might escalate from row locks to page locks and even table locks.
Concurrent use of the database will be severely affect during the import.
I would recommend the following instead of a single DB transaction:
See if it possible to create smaller transaction batches. Maybe 100 records at a time. Perhaps it is possible to logically load sections of the XML file together, where it would be acceptable load a subset of the data as a unit into the system.
Validate as much of your data upfront. E.g. Check that required fields are filled or that FK's are correct.
Make the upload repeatable. Skip over existing data.
Provide a manual undo strategy. I know this is easier said than done, but might even be required as an additional business rule. For example the upload was successful but someone realises a couple of hours later that the wrong file was uploaded.
It might be useful to upload your data to a initial staging area in your DB to perform validations and to mark which records have been processed.
Use SSIS, and create and ETL package.
Use Transactions for the roll back feature, and stored procedure that handle creating/checking the foreign keys.