How to write data in database efficiently using c#? - c#

my windows app is reading text file and inserting it into the database. The problem is text file is extremely big (at least for our low-end machines). It has 100 thousands rows and it takes time to write it into the database.
Can you guys suggest how should i read and write the data efficiently so that it does not hog machine memory?
FYI...
Column delimiter : '|'
Row delimiter : NewLine
It has approximately 10 columns.. (It has an information of clients...like first name, last name, address, phones, emails etc.)
CONSIDER THAT...I AM RESTRICTED FROM USING BULK CMD.

You don't say what kind of database you're using, but if it is SQL Server, then you should look into the BULK INSERT command or the BCP utility.

Given that there is absolutely no chance of getting help from your security folks and using BULK commands, here is the approach I would take:
Make sure you are reading the entire text file first before inserting into the database. Thus reducing the I/O.
Check what indexes you have on the destination table. Can you insert into a temporary table with no indexes or dependencies so that the individual inserts are fast?
Does this data need to be visible immediately after insert? If not then you can have a scheduled job to read from the temp table in step 2 above and insert into the destination table (that has indexes, foreign keys etc.).

Is it possible for you to register your custom assembly into Sql Server? (I'm assuming it's sql server because you've already said you used bulk insert earlier).
Than you can call your assembly to do (mostly) whatever you need, like getting a file from some service (or whatever your option is), parsing and inserting directly into tables.
This is not an option I like, but it could be a saver sometimes.

Related

Any way other than scripting to back up data for one month of a single table in database?

I have actually built an application using C# to do differential backup with compression option on daily to reduce bak file size and increase the speed of backup. However, the file size is still too large for me. So I plan to backup the whole database but I want one of a table (that contain the most data) to contain only one month data. Any idea ? Thanks.
If not mistaken what you meant by large is you tried to backup the whole database and it require more time and processing power.
I assume that you don't have the privileges to modify the table in the original database. Maybe you can try using this https://www.w3schools.com/sql/sql_select_into.asp
to copy the rows that you can set in the "WHERE" statement to another table (external database) and then backup it.

Copying Data from Oracle Server to SQL Server

I'm quite new to coding in general and I'm looking to copy 47 columns with c300,000 rows of data, from an Oracle to an SQL database, on a daily basis. The code will be stored as a Windows Service, running at the same time every day (or more likely night).
The data from the Oracle DB table (let's call this the Oracle_Source) will be used to both append to a history table (call this SQL_History) and also to append new/update matching/delete missing rows from a live table (call this SQL_Live). The two types of databases are housed on different servers, but the two SQL tables are on the same DB.
I have a few questions around the best way to approach this.
Using VB/C#, is it faster to loop through rows (either 1 by 1 or batches of 100/1000/etc.) of Oracle_Source and insert/update SQL_History/SQL_Live OR copy the entire table of Oracle_Source in one go and insert into the SQL tables? Previously I have used the loop to download data into a .csv.
Using the more efficient of the above methods, would it be faster to work on both SQL tables simultaneously OR copy the data into the SQL_History table and then use that to APPEND/UPDATE/DELETE from the SQL_Live table?
Am I approaching this completely wrong?
Any other advice available is also much appreciated.
The correct question is “What is the fast way to copy the table?”
In your specific case , with 2 different server and a “big” table to copy, you are probably limited by network IO.
So, the first point is to update only the rows that must be update (Update/ Insert / Delete), so less byte to move.
To answer to your first point, you have to use transaction to improve the speed on sql server during the writing phase. The dimension of transaction depend on differenct factor (db, machine, ...) but I usually make transaction with 500/1000 simple commands. In my personal experience, if you use INSERT with more rows, you can send 500 rows for INSERT without performance issue.
In my experience, a bulk copy is faster than an efficient INSERT, UPDATE and DELETE because the db does not calculate key and does not check duplicate rows.
Better explained:
you TRUNCATE all data
DISABLE keys
massive INSERT of all rows and
re-ENABLE keys.
This is the faster way to copy a table but if your communication is from different server with low network speed this can't be the best choice.
Obviously, what is the best choice depend from your infrastructure and the table dimension
For example:
If you have one server your lan and the second server on clouds, the bottleneck is on the speed of internet connection and you must pay more attention to have an efficient communication(less byte).
If both servers are on your lan with two gigabit connection, probably the full network communication are around 100mb, and you can use a simple move all the table rows without headache.

How large of a SQL string (for mass update) can I realistically use in C# .net4?

I'm recieving and parsing a large text file.
In that file I have a numerical ID identifying a row in a table, and another field that I need to update.
ID Current Location
=========================
1 Boston
2 Cambridge
3 Idaho
I was thinking of creating a single SQL command string and firing that off using ADO.Net, but some of these files I'm going to recieve have thousands of lines. Is this doable or is there a limit I'm not seeing?
If you may have thousands of lines, then composing a SQL statement is definitely NOT the way to go. Better code-based alternatives include:
Use SQLBulkCopy to insert the change data to a staging table and then UPDATE your target table using the staging table as the source. It also has excellent batching options (unlike the other choices)
Write a stored procedure to do the Update that accepts an XML parameter that contains the UPDATE data.
Write a stored procedure to do the Update that accepts a table-valued parameter that contains the UPDATE data.
I have not compared them myself but it is my understanding that #3 is generally the fastest (though #1 is plenty fast for almost any need).
Writing one huge INSERT statement well be very slow. You also don't want to parse the whole massive file at once. What you need to do is something along the lines of:
Figure out a good chunk size. Let's call it chunk_size. This will be the number of records you'll read from the file at a time.
Load chunk_size number of records from the file into a DataTable.
Use SQLBulkCopy to insert the DataTable into the DB.
Repeat 2 & 3 until the file is done.
You'll have to experiment to find an optimal size for chunk_size so start small and work your way up.
I'm not sure of an actual limit, if one exists, but why not take "bite sized" chunks of the file that you feel comfortable with and break it into several commands? You can always wrap it in a single transaction if it's important that they all fail or succeed.
Say grab 250 lines at a time, or whatever.

Import CSV into SQL multiple tables

I'm migrating data from one system to another and will be receiving a CSV file with the data to import. The file could contain up to a million records to import. I need to get each line in the file, validate it and put the data into the relevant tables. For example, the CSV would be like:
Mr,Bob,Smith,1 high street,London,ec1,012345789,work(this needs to be looked up in another table to get the ID)
There's a lot more data than this example in the real files.
So, the SQL would be something like this:
Declare #UserID
Insert into User
Values ('Mr', 'Bob', 'Smith', 0123456789)
Set #UserID = ##Identity
Insert into Address
Values ('1 high street', 'London', 'ec1', select ID from AddressType where AddressTypeName = 'work')
I was thinking of iterating over each row and call an SP with the parameters from the file which will contain the SQL above. Would this be the best way of tackling this? It's not time critical as this will just be run once when updating a site.
I'm using C# and SQL Server 2008 R2.
What about you load it into a temporary table (note that this may be logically temporary - not necessarily technically) as staging, then process it from there. This is standard ETL behavior (and a million is tiny for ETL), you first stage the data, then clean it, then put it to the final place.
When performing tasks of this nature, you do not think in terms of rotating through each record individually as that will be a huge performence problem. In this case you bulk insert the records to a staging table or use the wizard to import to a staging table (look out for teh deafult 50 characters espcially in the address field).Then you write set-based code to do any clean up you need (removing bad telephone numbers or zip code or email addresses or states or records missing data in fields that are required in your database or transforing data using lookup tables (suppose you have table with certain required values, those are likely not the same values that you wil find in this file, you need to convert them. We use doctor specialties a lot. So our system might store them as GP but the file might give us a value of General Practioner. You need to look at all teh non-matching values for the field and then determine if you can map them to existing values, if you need to throw the record out or if you need to add more values to your lookup table. Once you have gotten rid of records you don't want and cleaned up those you can in your staging table then you import to the prod tables. Inserts should be written using the SELECT version of INSERT not with the VALUES clause when you are writing more than one or two records.

Performance issues with transpose and insert large, variable column data files into SQL Server

I'm currently working on a project where we have a large data warehouse which imports several GB of data on a daily basis from a number of different sources. We have a lot of files with different formats and structures all being imported into a couple of base tables which we then transpose/pivot through stored procs. This part works fine. The initial import however, is awfully slow.
We can't use SSIS File Connection Managers as the columns can be totally different from file to file so we have a custom object model in C# which transposes rows and columns of data into two base tables; one for column names, and another for the actual data in each cell, which is related to a record in the attribute table.
Example - Data Files:
Example - DB tables:
The SQL insert is performed currently by looping through all the data rows and appending the values to a SQL string. This constructs a large dynamic string which is then executed at the end via SqlCommand.
The problem is, even running in a 1MB file takes about a minute, so when it comes to large files (200MB etc) it takes hours to process a single file. I'm looking for suggestions as to other ways to approach the insert that will improve performance and speed up the process.
There are a few things I can do with the structure of the loop to cut down on the string size and number of SQL commands present in the string but ideally I'm looking for a cleaner, more robust approach. Apologies if I haven't explained myself well, I'll try and provide more detail if required.
Any ideas on how to speed up this process?
The dynamic string is going to be SLOW. Each SQLCommand is a separate call to the database. You are much better off streaming the output as a bulk insertion operation.
I understand that all your files are different formats, so you are having to parse and unpivot in code to get it into your EAV database form.
However, because the output is in a consistent schema you would be better off either using separate connection managers and the built-in unpivot operator, or in a script task adding multiple rows to the data flow in the common output (just like you are currently doing in building your SQL INSERT...INSERT...INSERT for each input row) and then letting it all stream into a destination.
i.e. Read your data and in the script source, assign the FileID, RowId, AttributeName and Value to multiple rows (so this is doing the unpivot in code, but instead of generating a varying number of inserts, you are just inserting a varying number of rows into the dataflow based on the input row).
Then pass that through a lookup to get from AttributeName to AttributeID (erroring the rows with invalid attributes).
Stream straight into an OLEDB destination, and it should be a lot quicker.
One thought - are you repeatedly going back to the database to find the appropriate attribute value? If so, switching the repeated queries to a query against a recordset that you keep at the clientside will speed things up enormously.
This is something I have done before - 4 reference tables involved. Creating a local recordset and filtering that as appropriate caused a speed up of a process from 2.5 hours to about 3 minutes.
Why not store whatever reference tables are needed within each database and perform all lookups on the database end? Or it may even be better to pass a table type into each database where keys are needed, store all reference data in one central database and then perform your lookups there.

Categories