Import DBF or Excel file into SQL Server with dynamic column mapping using asp.net - c#

Recently, I've been importing a DBF file of approx 50 to 250 millions records with discrete columns structure into database table.
Suppose I'm having an database table "RawUpload" which consists of 68 columns to save uploaded details. This database Table structure is same for dbase/Excel file for unMapped columns. Consider a case where dbase File A1 has column name 'TrxnNo' and in another File A2 has column name 'TrxnNum' Or 'TradeNo' which alternately mapped to the database table column TransactionNumber nvarchar(255). Similarly, their are approx. 10 to 30 column in files which can changed frequently. It's hard to track them, so i wanted to go with dynamic column Mapping with database table at runtime.
To accomplish this task I've tried following cases :
Using SSIS package but not able to do with dynamic column structure.
Using SqlBulkCopy which taking more time to complete and even results to timeout and application hang state.
Using DTS (Interop.DTS) package works fine but fails for huge records.
Now my questions here is: which is the fastest way to import such big files into a SQL Server table with dynamic column mapping which will work for large data ?
Thanks

Related

SSIS Package how to import CSV lines selectively based on query

I have multiple CSV files with a 100 or so columns each that I want to import into an SQL database using an SSIS package. These CSV files are received every night, and I want our SQL tables to function as a history/track changes table.
In other words I'm required to evaluate each line of the CSV prior to import based on a unique identifier. I need to check the latest (based on date of import) entry of an ID in the table, and if it exists & differs from the new line in the CSV, it should be imported. If it's a duplicate, it should be ignored. if it doesn't exist at all, it should also be imported.
I can't simply filter out all duplicates, as a change from X to Y and then back to X should be recorded in the table (as it's a history/change table).
Originally I tried to get this working with a flat file import -> Lookup tool -> DB destination, but it doesn't look as though I can modify the lookup tool to use a specific query as opposed to just comparing the indicated columns with a DB to see if it exists. Is there a way to achieve this using the provided SSIS tools? The only alternative that I can see is creating a custom script task to evaluate each line in the CSV beforehand and writing it to a temp table or a new CSV with only the data that needs to be inserted. This might be a perfectly viable solution but I'm afraid of performance issues.

Upload CSV File then Mapped to Tables in Database

How can we upload a CSV file through web(ASP.NET MVC C#) and mapped the column in CSV table to our tables in database
Example:
CSV File:
Username,Address,Roles,Date
How to add all the value in 'Username' column to User Table, Name Column ?
Value in 'Address' column to AddrDet Table, Address Column?
Value in 'Roles' column to RolesDet Table, Roles Column?
AND choose the CSV column to be added to database? (So not all column in CSV will be taken)
using ASP.NET MVC C#
Because all I know is when the CSV uploaded, it will create DataTable specially for CSV and all the column in CSV will be uploaded
Thank You
I'm using MVC and EF DB FIRST
This questions is being marked as duplicate of Upload CSV file to SQL server
I don't feel (& don't think) that the question is related to or has completely same topic, so I'm answering this. I have myself marked question as too broad, as there is too much to explain.
Also I will add some links to the question, however they are not here to fill the answer, only to give OP an idea what question/topics to look for himself.
Short explanation:
Usually when You want to import data (CSV file) into database, You already got structure & schema of data (and Database). There is existing TableA and TableB, where exist some columns inside. If You want to dynamically create new columns/update schema of DB based on CSV file, this is an uneasy work (normally is not happening).
C#/ASP/.NET application is working in a way where You give it an input (from users' clicks, data load, task scheduler passed some time checkpoint) and the APP do the work.
Typical job looks like: "We got data in this format, APP have to convert them to the inner representation (classes) and then insert them to the server". So You have to write an ASP page, where You allow user to paste/load the file. E.g.: File Upload ASP.NET MVC 3.0
Once You have loaded the file, You need to convert the CSV format (format of stored data) into Your internal representation. Which means create Your own class, with some properties and convert (Transform) CSV into the classes. E.g.: Importing CSV data into C# classes
Since You have this data inside classes (objects - instances of classes), You can work with them and carry out some internal work. This time we are looking for CRUD (Create/Read/Update/Delete) operations against SQL database. First You need to connect to SQL server, choose database and then run the queries. E.g.: https://www.codeproject.com/Articles/837599/Using-Csharp-to-connect-to-and-query-from-a-SQL-da
Plenty of developers are too lazy to write the queries themselves and they like more Object-Oriented access to this sort of problem. They are using ORM - Object-relation mapping, which allows users to have same class/object schema in Database and in the Application. One example for all is Entity-Framework (EF). E.g.: http://www.entityframeworktutorial.net/
As You can see this topic is not so easy and requires knowledge in several parts of programming.

Editing a large dataset for SQLBulkCopy into a SQL Server database

I have a VERY large (50 million+ records) dataset that I am importing from an old Interbase database into a new SQL Server database.
My current approach is:
acquire csv files from the Interbase database (done, used a program called "FBExport" I found somewhere online)
The schema of the old database doesn't match the new one (not under my control), so now I need to mass edit certain fields in order for them to work in the new database. This is the area I need help with
after editing to the correct schema, I am using SqlBulkCopy to copy the newly edited data set into the SQL Server database.
Part 3 works very quickly, diagnostics shows that importing 10,000 records at once is done almost instantly.
My current (slow) approach to part 2 is I just read the csv file line by line, and lookup the relevant information (ex. the csv file has an ID that is XXX########, whereas the new database has a separate column for each XXX and ########. ex2. the csv file references a model via a string, but the new database references via an ID in the model table) and then insert a new row into my local table, and then SqlBulkCopy after my local table gets large.
My question is: What would be the "best" approach (perfomance wise) for this data-editing step? I figure there is very likely a linq-type approach to this, would that perform better, and how would I go about doing that if it would?
If step #3’s importing is very quick, I would be tempted to create a temporary database whose schema exactly matches the old database and import the records into it. Then I’d look at adding additional columns to the temporary table where you need to split the XXX######## into XXX and ########. You could then use SQL to split the source column into the two separate ones. You could likewise use SQL to do whatever ID based lookups and updates you need to ensure the record relationships continue to be correct.
Once the data has been massaged into a format which is acceptable, you can insert the records into the final tables using IDENTITY_INSERT ON, excluding all legacy columns/information.
In my mind, the primary advantage of doing it within the temporary SQL DB is that at any time you can write queries to ensure that record relationships using the old key(s) are still correctly related to records using the new database’s auto generated keys.
This is of coursed based on me being more comfortable doing data transformations/validation in SQL than in C#.

Import CSV into SQL multiple tables

I'm migrating data from one system to another and will be receiving a CSV file with the data to import. The file could contain up to a million records to import. I need to get each line in the file, validate it and put the data into the relevant tables. For example, the CSV would be like:
Mr,Bob,Smith,1 high street,London,ec1,012345789,work(this needs to be looked up in another table to get the ID)
There's a lot more data than this example in the real files.
So, the SQL would be something like this:
Declare #UserID
Insert into User
Values ('Mr', 'Bob', 'Smith', 0123456789)
Set #UserID = ##Identity
Insert into Address
Values ('1 high street', 'London', 'ec1', select ID from AddressType where AddressTypeName = 'work')
I was thinking of iterating over each row and call an SP with the parameters from the file which will contain the SQL above. Would this be the best way of tackling this? It's not time critical as this will just be run once when updating a site.
I'm using C# and SQL Server 2008 R2.
What about you load it into a temporary table (note that this may be logically temporary - not necessarily technically) as staging, then process it from there. This is standard ETL behavior (and a million is tiny for ETL), you first stage the data, then clean it, then put it to the final place.
When performing tasks of this nature, you do not think in terms of rotating through each record individually as that will be a huge performence problem. In this case you bulk insert the records to a staging table or use the wizard to import to a staging table (look out for teh deafult 50 characters espcially in the address field).Then you write set-based code to do any clean up you need (removing bad telephone numbers or zip code or email addresses or states or records missing data in fields that are required in your database or transforing data using lookup tables (suppose you have table with certain required values, those are likely not the same values that you wil find in this file, you need to convert them. We use doctor specialties a lot. So our system might store them as GP but the file might give us a value of General Practioner. You need to look at all teh non-matching values for the field and then determine if you can map them to existing values, if you need to throw the record out or if you need to add more values to your lookup table. Once you have gotten rid of records you don't want and cleaned up those you can in your staging table then you import to the prod tables. Inserts should be written using the SELECT version of INSERT not with the VALUES clause when you are writing more than one or two records.

Multiple update from C# 3.0

I have a folder that contains more than 100 .txt files. These files contain huge amount of data that needs to be orchestrated and updated in the SQL Server table. Each text file line will have only two columns that interests me and based upon the 1st column data (which is a PK in the SQL Server Table) I want to update the 2nd column data in the DB table.
Please suggest the best way to do it in C# 3.0
Presently I am using a StringBuilder and appending the Update query.
I think first you should use Bulk Copy to import the data to the sql server and then you write stored procedure. Building the query and then running will make the whole process slow.

Categories