What is the fastest way to update sql table? - c#

I have a C# app which allows the user to update some columns in a DB. My problem is that I have 300.000 records in the DB, and just updating 50.000 took 30 mins. Can I do something to speed things up?
My update query looks like this:
UPDATE SET UM = 'UM', Code = 'Code' WHERE Material = 'MaterialCode'.
My only unique constrain is Material. I read the file the user selects, and put the data in a datatable, and then I go row by row, and update the corresponding material in the DB

Limit the number of indexes in your database especially if your application updates data very frequently.This is because each index takes up disk space and slow the adding, deleting, and updating of rows, you should create new indexes only after analyze the uses of the data, the types and frequencies of queries performed, and how your queries will use the new indexes.
In many cases, the speed advantages of creating the new indexes outweigh the disadvantages of additional space used and slowly rows modification. However, avoid using redundant indexes, create them only when it is necessary. For read-only table, the number of indexes can be increased.

Use non clustered index on the table if the update is frequent.
Use clustered index on the table if the updates/inserts are not frequent.
C# code may not be a problem , your update statement is important. Where clause of the update statement is the place to lookout for. You need to have some indexed column in the where clause.

Another thing, is the field material, indexed? And also, is the where clause, needed to be on a field with a varchar value? Can't it be an integer valued field?
Performance will be better if you filter on fields having integers and not strings. Not sure if this is possible for you.

Related

Indexing a C# Datatable to search like you would with a SQL query

I couldn't find any similar questions in Stack Overflow
Is there a way to create a virtual SQL database with indexes in memory? Or maybe a built in function for creating indexes on datatables to quickly searching a column in a table multiple times? Trying to compare each row of table A against the indexed entry in table B (rather than looping through every row of table B completely for each row in Table A).
Right now I'm creating a dictionary<T,int> index where T is the typeof the column being indexed, and int is the row. That way if I create an index, I cycle through all rows of the table and create a dictionary key on the column (with the dictionary value being the row number). This works for unique primary keys, and I've used a variation with int lists if there are multiple rows for a given key.
This works when trying to find the exact value in another table, but not if I want to perform a comparison and find all int keys greater than a specific value. I could probaby reinvent the wheel with a sorted binary search tree (especially since the table data would be static), but would rather use an existing solution without the risk of introducing my own code errors.
Create in-memory SQLite Database and then you can use all of its benefits.
https://sqlite.org/inmemorydb.html

Updating currently-read row using ADO.NET

I need to update a column in a large table (over 30 million rows) that has no primary key. A table row has a unique email address column. The update involves generating a value that must occur in C# and appending it to a column value. So the row must be read, the column value updated, and written back out.
I was hoping there was a concept of cursoring in ADO.NET but I do not see this. I can read the rows quickly enough, but the update call, using a WHERE clause for the email address, takes forever. After researching this most answers seem to be "put in a primary key!" but that is not an option here. Any thoughts?
For a 30mil rows heap, there's not many options. Without any index you can do basically nothing to speed it up.
Only solution is to check a fragmentation of a heap. You should add a clustered index to alleviate the table fragmentation, then drop it immediately. But if you cannot affect that table in any way, it could be faster to move all the data into a new table :-)

Efficient insert statement

I'm looking for an efficient way of inserting records into SQL server for my C#/MVC application. Anyone know what the best method would be?
Normally I've just done a while loop and insert statement within, but then again I've not had quite so many records to deal with. I need to insert around half a million, and at 300 rows a minute with the while loop, I'll be here all day!
What I'm doing is looping through a large holding table, and using it's rows to create records in a different table. I've set up some functions for lookup data which is necessary for the new table, and this is no doubt adding to the drain.
So here is the query I have. Extremely inefficient for large amounts of data!
Declare #HoldingID int
Set #HoldingID = (Select min(HoldingID) From HoldingTable)
While #JourneyHoldingID IS NOT NULL
Begin
Insert Into Journeys (DepartureID, ArrivalID, ProviderID, JourneyNumber, Active)
Select
dbo.GetHubIDFromName(StartHubName),
dbo.GetHubIDFromName(EndHubName),
dbo.GetBusIDFromName(CompanyName),
JourneyNo, 1
From Holding
Where HoldingID = #HoldingID
Set #HoldingID = (Select MIN(HoldingID) From Holding Where HoldingID > #HoldingID)
End
I've heard about set-based approaches - is there anything that might work for the above problem?
If you want to insert a lot of data into a MSSQL Server then you should use BULK INSERTs - there is a command line tool called the bcp utility for this, and also a C# wrapper for performing Bulk Copy Operations, but under the covers they are all using BULK INSERT.
Depending on your application you may want to insert your data into a staging table first, and then either MERGE or INSERT INTO SELECT... to transfer those rows from the staging table(s) to the target table(s) - if you have a lot of data then this will take some time, however will be a lot quicker than performing the inserts individually.
If you want to speed this up then are various things that you can do such as changing the recovery model or tweaking / removing triggers and indexes (depending on whether or not this is a live database or not). If its still really slow then you should look into doing this process in batches (e.g. 1000 rows at a time).
This should be exactly what you are doing now.
Insert Into Journeys(DepartureID, ArrivalID, ProviderID, JourneyNumber, Active)
Select
dbo.GetHubIDFromName(StartHubName),
dbo.GetHubIDFromName(EndHubName),
dbo.GetBusIDFromName(CompanyName),
JourneyNo, 1
From Holding
ORDER BY HoldingID ASC
you (probably) are able to do it in one statement of the form
INSERT INTO JOURNEYS
SELECT * FROM HOLDING;
Without more information about your schema it is difficult to be absolutely sure.
SQLServer 2008 introduced Table Parameters. These allow you to insert multiple rows in a single trip to the database (send it as a large blob). Without using a temporary table. This article describes how it works (step four in the article)
http://www.altdevblogaday.com/2012/05/16/sql-server-high-performance-inserts/
It differs from bulk inserts in that you do not need special utilities and that all constraints and foreign keys are checked.
I quadrupled my throughput using this and parallelizing the inserts. Now at 15.000 inserts/second in the same table sustained. Regular table with indexes and over a billion rows.

Enum Vs Inner Join / Where

I have defined various text value by int. I store int value in data table for better and fast search. I have three options to display text value:
I declare Enum in my codes and display text value according to int value. It is static and I have to change code if new values is to be added.
To make it dynamic, I can store int and text value in a table which is in another database and admin own it. New values can be updated by admin in this table. I use inner join to display text value whenever a record is fetched.
I store actual text in respective data table. This will make search slow.
My question is which option is best to use under following condition?
Data table has more than records between 1 and 10 millions.
There are more than 5000 users doing fetch, search, update process on table.
Maximum text values are 12 in number and length (max) 50 char.
There are 30 data tables having above conditions and functions.
I like combination of option #2 and option #1 - to use int's but have dictionary table in another database.
Let me explain:
to store int and text in a table which is in another database;
in origin table to store int only;
do not join table from another database to get text but cache dictionary on client and resolve text from that dictionary
I would not go for option 1 for the reason given. Enums are not there as lookups. You could replace 1 with creating a dictionary but again it would need to be recompiled each time a change is made which is bad.
Storing text in a table (ie option 3) is bad if it is guaranteed to be duplicated a lot as here. This is exactly where you should use a lookup table as you suggest in number 2.
So yes, store them in a database table and administer them through that.
The joining shouldn't take long to do at all if it is just to a small table. If you are worried though an alternative might be to load the lookup table into a dictionary in the code the first time you need it and look up the values on the code from your small lookup table. I doubt you'll have problems with just doing it by the join though.
And I'd do this approach no matter what the conditions are (ie number of records, etc.). The conditions do make it more sensible though. :)
If you have literally millions of records, there's almost certainly no point in trying to spin up such a structure in server code or on the client in any form. It needs to be kept in a database, IMHO.
The query that creates the list needs to be smart enough to constrain the count of returned records to a manageable number. Perhaps partitioned views or stored procedures might help in this regard.
If this is primarily a read-only list, with updates only done in the context of management activities, it should be possible to make queries against the table very rapid with proper indexes and queries on the client side.

Compare an array with a "very large" table of a SQL Server database

In an C# program I have an array with about 100.000 elements.
Then I have a SQL Server 2008 table where the primary key column contains more or less nearly all elements of the array (but a few not). The table can have up to 30.000.000 rows.
Now I want to determine which elements of the array do not exist in the table. How can this be achieved efficiently?
The most efficient method would probably be to bulk-insert those 100,000 elements into a temp table and then perform the comparison within the database itself.
(Note that I haven't tested this theory; it's just an educated guess.)
Query the table with a
select <primarykey> where <primarykey> in (<primary key of ur list of elements in c#>)
This should be faster than inserting all rows into a table and then checking with an except/minus command for missing elements, because it does not involve any write operation.
Once you have the list of primary keys which are common..pull it back into c# and compare.
A way to avoid creating temp tables would be to use a stored procedure which accepts a table valued parameter of a user-defined table type (udtt). This table would have a schema of one column of a data type matching that in your array.
If you populate a DataTable (with a schema matching the udtt schema) with your array values and supply the data table as your stored proc's parameter, you can pass up all 100,000 of your items in their sql binary format. The proc can just do a join between the 30M row table and the table-valued parameter, returning the items in the TVP table with no matches in the master table.
This avoids needing to build massive IN statements.
EDIT Regarding the comment from #Kyro below
I'm now less confident in this approach. I found an article showing the under-the-covers row-by-row inserts that Kyro describes. What you might gain in sending binary data over the network rather than a large TSQL where in() statement, may well be taken away by the performance SQL side. However, it's a fairly simple code approach, so might just be worth a quick test. Let us know how you get on?

Categories