Concurrent inserts into database - c#

I'm building a program that takes push data from six different sources and inserts the data into a database. Each source has its own function to execute the inserts as soon as they come, but all sources write to the same table.
I would have the following questions:
If one source is currently writing to the table and another source begins to write at the same time is there any chance the inserts will block each other?
The table is also constantly being used to read the data via a view that join some more tables to show the data, can this pose any problems?
Currently each source has its own DB connection to write data, would it be better to have only one connection, or have each use its own?

If one source is currently writing to the table and another source
begins to write at the same time is there any chance the inserts will
block each other?
It depends on the indexes. If the index keys have the same or contiguous values, you may see short=term blocking for the duration of the transaction.
The table is also constantly being used to read the data via a view
that join some more tables to show the data, can this pose any
problems?
It depends on the isolation level. No blocking will occur if:
SELECT queries are running in READ_COMMITTED isolation level and
the READ_COMMITTED_SNAPSHSOT database option is turned on
the SELECT queries don't touch uncommitted data
the SELECT queries run in READ_UNCOMMITTED isolation level
Even if blocking does occur, it may be short-lived if the INSERT transactions are short.
Currently each source has its own DB connection to write data, would
it be better to have only one connection, or have each use its own?
Depends on the problem you are trying to solve. A single connection will ensure inserts don't block/deadlock with each other but might not be an issue anyway.

Please find the below inline answer
If one source is currently writing to the table and another source begins to write at the same time is there any chance the inserts will block each other?
In this case another resource will wait for it.(Insert will be in waiting state for next one)
The table is also constantly being used to read the data via a view that join some more tables to show the data, can this pose any problems?
No problem.
Currently each source has its own DB connection to write data, would it be better to have only one connection, or have each use its own?
Its better to have one DB connection.

Block "each other" i.e. dead-lock is not possible.
No problem. Only if select is too slow, it can delay next insert.
No problem with different connections.

Related

AWS SQS with mysql and C# with hundreds of thousands messages

I am working with aws sqs queue. The queue may having massive messages i.e if i do not process there will be more than a million mesasge per hour.
I am processing all the messages and putting them into a mysql table. Innodb with 22 columns. Insert on Duplicate Key Update. I have a primary key and unique key.
I am working with C# where i ran 80 threads in order to pull messages from sqs.
I applied transaction in c# run the query like "insert on duplicate key update"
at the same time i am using lock in c# so only single thread can update the table. if id do not use C# lock then an exception is thrown from mysql dead lock occured.
Problem is here i can see there are a lot of threads are waiting before C# lock and this time gradually increasing. Can any body suggest me what is the best way to do this..
Note, i have 8GB RAM intell xeon 2.53 with 1GE internet speed. please suggest me in this regard.
If I were to do it, the C# program would primarily be creating the CSV file to empty your SQS queue. Or at least a significant chunk of it. The file would then be used for bulk insert into an empty non-indexed in anyway worktable. I would steer for non-temporary but whatever. I see no reason to add temporary to the mix when this is recurring, and when done the worktable is truncated anyway.
The bulk insert would be achieved through LOAD DATA FROM INFILE fired off from the c# program. Alternatively, a value in a new row in some other table could be written with an incrementer saying file2 is ready, file3 is ready, and the LOAD happens in an event triggered, say every n minutes. An event that was put together with mysql Create Event. Six of one, half dozen of another.
But the benefits of a sentinal, a mutex, might be of value, as this whole thing happens in batches. And the next batch(es) to be processed need to be suspended while this occurs. Let's call this concept The Blocker, and the one being worked on is row N.
Ok, now your data is in the worktable. And it is safe from being stomped on until processed. Let's say you have 250k rows. Other batches shortly to follow. If you have special processing to have happen, you may wish to create indexes. But at this moment there are none.
You perform a normal insert on duplicate key update (IODKU) to the REAL table using this worktable. It would, in that IODKU follow a normal insert into select pattern, where the select part comes from the worktable.
At the end of that statement, the worktable is truncated, any indexes dropped, row N has its status set to complete, and The Blocker is free to work on row N+1 when it appears.
The indexes are dropped to facilitate the next round of bulk insert, where maintaining indexes is of least importance. And indexes on the worktable may very well be overhead baggage unnecessary during IODKU.
In this manner, you get the best of both worlds
LOAD DATA FROM INFILE
IODKU
And the focus is taken off of multi-threading, a good thing to take one's focus off of.
Here is a nice article on performance and strategies titled Testing the Fastest Way to Import a Table into MySQL. Don't let the mysql version of the title or inside the article scare you away. Jumping to the bottom and picking up some conclusions:
The fastest way you can import a table into MySQL without using raw
files is the LOAD DATA syntax. Use parallelization for InnoDB for
better results, and remember to tune basic parameters like your
transaction log size and buffer pool. Careful programming and
importing can make a >2-hour problem became a 2-minute process. You
can disable temporarily some security features for extra performance
I would separate the C# routine entirely from the actual LOAD DATA and IODKU update effort and leave that to the event mentioned with Create Event for several reasons. Mainly better design. As such the C# program is only dealing with SQS and writing out files with incrementing file #'s.

Deadlock on transaction with multiple tables

My scenario is common:
I have a stored procedure that need to update multiple tables.
if one of updates failed - all the updates should be rolled back.
the strait forward answer is to include all the updates in one transaction and just roll that back. however, in system like ours , this will cause concurrency issues.
when we break the updates into multiple short transactions - we get throughput of ~30 concurrent executions per second before and deadlocking issues start to emerge.
if we put it to one transaction which span all of them - we get concurrent ~2 per second before deadlock shows up.
in our case, we place a try-catch block after every short transaction, and manually DELETE/Update back the changes from the previous ones. so essentially we mimic the transaction behavior in a very expensive way...
It is working alright since its well written and dont get many "rollbacks"...
one thing this approach cannot resolve at all is a case of command timeout from the web server / client.
I have read extensively in many forms and blogs and scanned through the MSDN and cannot find a good solution. many have presented the problem but I am yet to see a good solution.
The question is this: is there ANY solution to this issue that will allow a stable rollback of update to multiple tables, without require to establish exclusivity lock on all of the rows for the entire duration of the long transaction.
Assume that it is not an optimization issue. The tables are almost at the max optimization probably, and can give a very high throughput as long as deadlock don't hit it. there are no table locks/page locks etc. all row locks on updates - but when you have so many concurrent sessions some of them need to update the same row...
it can be via SQL, client side C#, server side C# (extend the SQL server?).
Is there such solution in any book/blog that i have not found?
we are using SQL server 2008 R2, with .NET client/web server connecting to it.
Code example:
Create procedure sptest
Begin transaction
Update table1
Update table2
Commit transaction
In this case, if sptest is run twice, the second instance cannot update table 1 until instance 1 has committed.
Compared to this
Create sptest2
Update table1
Update table2
Sptest2 has a much higher throughput - but it has chance to corrupt the data.
This is what we are trying to solve. Is there even a theoretical solution to this?
Thanks,
JS
I would say that you should dig deeper to find out the reason why deadlock occurs. Possibly you should change the order of updates to avoid them. Maybe some index is "guilty".
You cannot roolback changes if other transactions can change data. So you need to have update lock on them. But you can use snapshot isolation level to allow consistent reads before update commits.
For all inner joined tables that are mostly static or with a high degree of probability not effect the query by using dirty data then you can apply:
INNER JOIN LookupTable (with NOLOCK) lut on lut.ID=SomeOtherTableID
This will tell the query that I do not care about updates made to SomeOtherTable
This can reduce your issue in most cases. For more difficult deadlocks I have implemented a deadlock graph that is generated and emailed when a deadlock occurs contains all the detailed info for the deadlock.

I have roughly 30M rows to Insert Update in SQL Server per day what are my options?

I have roughly 30M rows to Insert Update in SQL Server per day what are my options?
If I use SqlBulkCopy, does it handle not inserting data that already exists?
In my scenario I need to be able to run this over and over with the same data without duplicating data.
At the moment I have a stored procedure with an update statement and an insert statement which read data from a DataTable.
What should I be looking for to get better performance?
The usual way to do something like this is to maintain a permanent work table (or tables) that have no constraints on them. Often these might live in a separate work database on the same server.
To load the data, you empty the work tables, blast the data in via BCP/bulk copy. Once the data is loaded, you do whatever cleanup and/or transforms are necessary to prep the newly loaded data. Once that's done, as a final step, you migrate the data to the real tables by performing the update/delete/insert operations necessary to implement the delta between the old data and the new, or by simply truncating the real tables and reloading them.
Another option, if you've got something resembling a steady stream of data flowing in, might be to set up a daemon to monitor for the arrival of data and then do the inserts. For instance, if your data is flat files get dropped into a directory via FTP or the like, the daemon can monitor the directory for changes and do the necessary work (as above) when stuff arrives.
One thing to consider, if this is a production system, is that doing massive insert/delete/update statements is likely to cause blocking while the transaction is in-flight. Also, a gigantic transaction failing and rolling back has its own disadvantages:
The rollback can take quite a while to process.
Locks are held for the duration of the rollback, so more opportunity for blocking and other contention in the database.
Worst, after all that happens, you've achieved no forward motion, so to speak: a lot of time and effort and you're right back where you started.
So, depending on your circumstances, you might be better off doing your insert/update/deletes in smaller batches so as to guarantee that you achieve forward progress. 30 million rows over 24 hours works out to be c. 350 per second.
Bulk insert into a holding table then perform either a single Merge statement or an Update and an Insert statement. Either way you want to compare your source table to your holding table to see which action to perform

DataReader and SQLCommand

As they say on the radio - long time listener first time caller....
Here's my issue. VS 2005 SQL Server 2005 Database. Windows Forms app. C#. Big table - 780K records. I'll call it the source table. Need to loop through the source table, and for each record do something with another table, then write back to the source table that it was completed. I haven't got as far as updating the second table yet...
I loop through all records of the source table using a datareader using connection object A. For each record I build an update statement to update the source table to indicate this record has been processed - and use a SQL Command against connection object B to do this uodate. So different connection objects because I know the dataReader wants an exclusive.
Here's the issue. After processing X records - where X seems to be about 60 - the update timesout.
While writing this - funny how this happens isn't it - my brain tells me this is to do with transaction isolation and / or locking...i.e. I'm reading through the source records using a datareader but changing those records...I can see this causing problems with different transaction isolations so I'll look into that...
Anyone seen this and know how to solve it?
Cheers
Pete
Without more detail, the possibilities for solutions are very numerous. As iivel noted, you could perform all of the activities within the database itself -- if that is possible for the type of operations you must perform. I would just add that it would very likely be best to do such a thing using set-based operations rather than cursors.
If you must perform this operation outside of the database, then your source query can be executed without any locks at all, if that is a safe thing to do in your case. You can do that by setting the isolation level or by simply appending with (nolock) after your table name / alias in the query.
There are several other options as well. For instance, you could operate in batches, instead, such as pulling 1000 records at a time from the source table into memory, disconnecting, and then performing whatever operations and updates you need. Once all 1000 records are processed, work on another set of 1000 records, and so on, until all records in the queue have been processed.
Sounds like you need to set the Command Timeout propery on the command object.
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlcommand.commandtimeout(v=vs.80).aspx
Is there any reason you can't have the select/updates in a cursor and return a final result tally to the app? If the process belongs in the the DB it is best to keep it there.
Alternately updating the command timout as John mentionted is your only other bet.

Insert data into SQL server with best performance

I have an application which intensively uses DB (SQL Server).
As it must have high performance I would like to know the fastest way to insert record into DB.Fastest from the standpoint of execution time.
What should I use ?
As I know the fastest way is to create stored procedure and to call it from code (ADO.NET).
Please let me know is there any better way or may be there are is some other practices to increase performance.
a bulk insert would be the fastest since it is minimally logged, perhaps you can use the SqlBulkCopy class
"It depends".
How many rows are you talking about inserting?
How frequently will they be inserted?
What other database operations will be taking place at the same time?
Will the rows be inserted because of user action (clicking a button), or because of some external stimulus?
Based on your update, I think you should consider mechanisms other than simple code. Look into SQL Server Integration Services, which are optimized for bulk database operations. It's possible that what you need is a simple SSIS job that runs periodically to do a bulk insert on all "new" data meeting particular criteria. It would allow modification over time to use things like staging tables or intermediate servers if that should prove necessary.
Please let me know is there any better way or may be there are is some other practices to increase performance.
Do not open one connection per record. Do learn how connection pooling generally stops you from inadvertently opening one connection per record.
If possible, do not open one transaction per record. Also do not leave the transaction open for undue periods of time.
Consider table design: narrow tables with few indexes/constraints and no triggers.
If you need a fast insert because you're a web application and need to return a page to the user NOW or you're a winform app and are blocking on the UI thread, consider performing the insert async or on another thread.
If you need a fast insert to import a million line file, consider doing a bulk insert.
If all you want to do is store the data, and not to query it... consider using a file-based solution instead.
Have you done the math? 2M/day = 83k/hour = 1388/min = 23/second.
At 23 inserts per second SQL Server won't break a sweat.

Categories