I have over 1 million rows that I check for changes and then update.
I completed my program that goes over each record and then updates the database. But this operation takes couple hours to complete with multithreading. I have optimized queries, inserts and checks to minimize the database load. I achieved much better results, but it is very slow.
Is there any way to maintain a DataTable with correct records in memory, and then upload whole data structure as 'virtual table' to SQL server in one update and let SQL server handle the updates?
I have seen similar in the past, it was done via function on Posrgre server (without involving C#). I need my program to be done in of minutes, not couple of hours.
Either insert your new data in a temp table with Bulk Copy or use a table valued parameter (TVP), then use the SQL MERGE command to update the rows in the existing table.
Have you looked at the SqlBulkCopy Class?
System.Data.SqlClient.SqlBulkCopy
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx
Related
I am having a list of records which I need to insert or update in a SQL DB based on whether the record is present or not present in the database.
The current flow is I process each record 1 by 1 and then call a Stored Procedure from my C# code which does the task of inserting or updating the database.
The above process is very inefficient, Can i use SQL Bulk Copy to insert these in once into the SQLDb .
Will above increase the performance .
Regards
Ankur
SqlBulkCopy can only insert. If you need to upsert, you might want to SqlBulkCopy into a staging table (a separate table off to one side that isn't part of the main model), and then do the merge in TSQL. You might also want to think about concurrency (how many people can be using the staging table at once, etc).
I have a table with about 100 columns and about 10000 rows.
Periodically, I will receive an Excel with similar data and I now need to update the table.
If new rows exist in Excel, I have to add them to the db.
If old rows have been updated, I need to update the rows in the db.
If some rows have been deleted, I need to delete the row from my main table and add to another table.
I have thought about proceeding as follows:
Fetch all rows from db into a DataSet.
Import all rows from Excel into a DataSet.
Compare these 2 DataSets now using joins and perform the required operations.
I have never worked with data of this magnitude and am worried about the performance. Let me know the ideal way to go about realizing this requirement.
Thanks in advance. :)
don't worry about the performance with 10k records, you will not notice it...
maybe a better way to do it is to import the excel file in a temp table and do the processing with a couple simple sql queries... you'll save on dev time and it will potentially perform better...
As my experience says, its so simple if you choose to do the stuff in t-sql as following:
You can use "OPENROWSET", "OPENQUERY", linked servers, DTS and many other thing in SQL Server to import the excel file into a temporary table.
You can write some simple queries to do that. If you are using SQL 2008, "MERGE" has exacly made for your question.
Another thing is that the performance is far different than C#. You can use "TOP" clause to chunk the comparison and do many other things.
Hope it helps.
Cheers
I am trying to figure out the best way to design my C# application which utilizes data from a SQL Server backend.
My application periodically has to update 55K rows each one at a time from a loop in my application. Before it does an update it needs to check if the record to be updated exists.
If it exists it updates one field.
If not it performs an insert of 4 fields.
The table to be updated has 600K rows.
What is the most efficient way to handle these updates/inserts from my application?
Should I create a data dictionary in c# and load the 600K records and query the dictionary first instead of the database?
Is this a faster approach?
Should I use a stored procedure?
What’s the best way to achieve maximum performance based on this scenario?
You could use SqlBulkCopy to upload to a temp table then have a SQL Server job do the merge.
You should try to avoid "update 55K rows each one at a time from a loop". That will be very slow.
Instead, try to find a way to batch the updates (n at a time). Look into SQL Server table value parameters as a way to send a set of data to a stored procedure.
Here's an article on updating multiple rows with TVPs: http://www.sqlmag.com/article/sql-server-2008/using-table-valued-parameters-to-update-multiple-rows
What if you did something like this, instead?
By some means, get those 55,000 rows of data into the database; if they're not already there. (If you're right now getting those rows from some query, arrange instead for the query-results to be stored in a temporary table on that database. (This might be a proper application for a stored procedure.)
Now, you could express the operations that you need to perform, perhaps, as two separate SQL queries: one to do the updates, and one or more others to do the inserts. The first query might use a clause such as "WHERE FOO IN (SELECT BAR FROM #TEMP_TABLE ...)" to identify the rows to be updated. The others might use "WHERE FOO NOT IN (...)"
This is, to be precise, exactly the sort of thing that I would expect to need to use a stored procedure to do, because, if you think about it, "the SQL server itself" is precisely the right party to be doing the work, because he's the only guy around who already has the data on-hand that you intend to manipulate. He alone doesn't have to "transmit" those 55,000 rows anywhere. Perfect.
I have a requirement where I need to update thousands of records in live database table, and although there are many columns in this table, I only need to update 2-3 columns.
Further I can't hit database for thousand times just for updating which can be done in a batch update using SQL Server Table Valued Parameter. But again I shouldn't update all thousands records in one go for better error handling, instead want to update records in batches of x*100.
So, below is my approach, please give your valuable inputs for any other alternatives or any change in the proposed process -
1 Fetch required records from database to List<T> MainCollection
2 Save this collection to XML file with each element Status = Pending
3 Take first 'n' elements from XML file with Status = Pending and add them to new List<T> SubsetCollection
4 Loop over List<T> SubsetCollection - make required changes to T
5 Convert List<T> SubsetCollection to DataTable
6 Call Update Stored Procedure and pass above DataTable as TVP
7 Update Status = Processed for XML Elements corresponding to List<T> SubsetCollection
8 If more records with Pending status exists in XML file, go to Step# 3.
Please guide for a better approach or any enhancement in above process.
I would do a database-only approach if possible and if not possible, eliminate the parts that will be the slowest. If you are unable to do all the work in a stored procedure, then retrieve all the records and make changes.
The next step is to write the changes to a staging table with SQL Bulk Copy. This is a fast bulk loaded that will copy thousands of records in seconds. You will store the primary key and the columns to be updated as well as a batch number. The batch number is assigned to each batch of records, therefore allowing another batch to be loaded without conflicting with the first batch.
Use a stored procedure on the server to process the records in batches of 100 or 1000 depending on performance. Pass the batch number to the stored procedure.
We use such a method to load and update millions of records in batches. The best speed is obtained by eliminating the network and allowing the database server to handle the bulk of the work.
I hope this might provide you with an alternate solution to evaluate.
It may not be the best practice but you could embed some logic inside a SQL Server CLR function. This function could be called by a Query,StoProc or a schedule to run at a certain time.
The only issue I can see is getting step 4 to make the required changes on T. Embedding that logic into the database could be detrimental to maintenance, but this is no different to people who embed massive amounts of business logic into StoProcs.
Either way SQL Server CLR functions may be the way to go. You can create them in Visual Studio 2008, 2010 (Check the database new project types).
Tutorial : http://msdn.microsoft.com/en-us/library/w2kae45k(v=vs.80).aspx
I have a DataTable which I want to save to a SQLite Database Table. Here is my dilemma, I don't know which way to go. At most the DataTable would contain 65,000 rows and probably 12 columns.
So, would it be faster to save the DataTable to a CSV file and then Bulk Insert it into SQLite (which I have no idea how to do) or would it be faster to loop through all the columns create parameters and then loop through each individual row in the datatable to retrieve the information to insert into the database table.
Is there an even better way than what I have listed?
Thanks,
Nathan
Check this question out.
There is a SqlBulkCopy in the .Net framework class that provides funcitonality for bulk inserts. Unfortunately it is supported only for SQL Server databases.
However tweaking a few parameters on your inserts will make the bulk insert a lot quicker. From what people are reporting there's not that much of a performance hit with single inserts.