How many concurrent statements does C# SqlConnection support?
Let's say I am working on Windows service running 10 threads. All threads use the same SqlConnection object but different SqlCommand object and perform operations like select, insert, update and delete on either different tables or same table but different data. Will it work? Will a single SqlConnection object be able to handle 10 simultaneous statements?
How many concurrent statements does C# SqlConnection support?
You can technically have multiple "in-flight" statements, but only one acutally executing.
A single SqlConnection maps to a single Connection and Session in SQL Server. In Sql Server a Session can only have a single request active at-a-time. If you enable MultipeActiveResultsets you can start a new query before the previous one is finished, but the statements are interleaved, never run in parallel.
MARS enables the interleaved execution of multiple requests within a
single connection. That is, it allows a batch to run, and within its
execution, it allows other requests to execute. Note, however, that
MARS is defined in terms of interleaving, not in terms of parallel
execution.
And
execution can only be switched at well defined points.
https://learn.microsoft.com/en-us/sql/relational-databases/native-client/features/using-multiple-active-result-sets-mars?view=sql-server-ver15
So you can't even guarantee that another statement will run whenever one becomes blocked. So if you want to run statements in parallel, you need to use multiple SqlConnections.
Note also that a single query might use a parallel execution plan, and have multiple tasks running in parallel.
David Browne gave you the answer the ask, but there might be something else you need to know:
Let's say I am working on Windows service running 10 threads. All threads use the same SqlConnection object but different SqlCommand object and perform operations like select, insert, update and delete on either different tables or same table but different data.
This design just seems wrong on several fronts:
You keep a disposeable resource around and open. My rule for Disposeable stuff is: "Create. Use. Dispose. All in the same piece of code, ideally using a using block." Keeping disposeable stuff around or even sharing it between threads is jsut not worth the danger of forgetting to close it.
There is no performance advantage: SqlConnection uses internall connection pooling without any side effects. And even if there is a relevant speed advantage, they would not be worth the dangers.
You are using Mutltithreading with Database Access. Multithreading is one way to implement multitasking, but not one you should use until you need it. Multithreading is only usefull with CPU bound work. Otherweise you should generally be using async/await or similar appraoches. DB Operations are either disk or network bound.
There is one exception to this rule, and that is if your application is a Server. Servers are teh rare example of something being pleasingly parallel. So having a large Threadpool to process incomming requests in paralell is very common. It is rather rare that you write one of those, however. Mostly you just run your code in a existing server infrastructure that deals with that.
If you do have heavy CPU work, chances are you are retreiving to much. It is a very common beginners mistake to retreive a lot, then do filtering in C# code. Do not do that. Do as much filtering and processing as possible in the Query. You will not be able to beat the speed of the DB-Server, and at best you tie up your network pointlessly.
Related
I want to read / write into a DB from multiple threads.
After some research, I remembered the ACID rules. Do I need to call myTrans = myConnection.BeginTransaction(); every time I want to read/write from inside a thread, in order to keep this Transaction safe from dirty reads/writes (and myTrans.Commit();)? In normal SQL I would use SET TRANSACTION ISOLATION LEVEL SERIALIZABLE to secure it.
How do i do that in C# ?
Thanks in advance
You only need to call BeginTransaction() if you need multiple statements included in the same transaction. It's not normally necessary for ACID rules for single statements, as individual sessions — each call to ExecuteReader()/ExecuteScalar()/ExecuteNonQuery()/Fill() — gives you an implicit transaction.
Even across multiple statements, my tendancy is to put the statements into the same long SQL string (or stored procedure) and include any needed transaction instructions as part of the SQL.
In terms of thread-safety, the best thing to do is use a separate, brand new connection object for each transaction, and wrap it in a using block. Connections are not thread-safe, and so the way to protect them is giving each thread (or transaction within a thread) it's own connection it doesn't have to share.
Even within a thread, it's better NOT to re-use the same connection. There is a feature called Connection Pooling, where the connection object you see in the C# code is a light-weight wrapper for a much-heavier actual connection that is shared from a pool. Trying to re-use the same connection object throughout a thread or application optimizes for the light thing at the expense of the heavy thing.
I have a FileShare crawler (getting permissions and dropping them somewhere for later Audit). Currently it is starting multiple threads to crawl the same folder (to speed up the process).
In C#, each SqlConnection object has its own SqlTransaction, initiated by the SqlConnection.BeginTransaction() call.
Here is the pseudo code of the current solution:
Get list of folders
For each folder get list of sub-folders
For each sub folder start a thread to collect file shares
Each thread will save collected data to database
Run Audit reports on the database
The problem arise when one of the sub folders threads fails. We end up with partial folder scanning which "cannot be detected easily". The main reason is that each thread is running on a separate connection.
I would like to have each folder to be committed in the same transaction rather than having incomplete scanning (current situation, when some threads fail). No transaction concept is implemented but I am evaluating the options.
Based on the comments of this answer, the producer/consumer queue would be an option but unfortunately memory is a limit (due to the number of started threads). In case the producer/consumer space is committed to disk to overcome the RAM limit, the execution time will go up (due to the very limited disk I/O compared to memory I/O). I guess I am stuck with a memory/time compromise. Any other suggestions?
It is possible to share the same transaction on multiple connections with SQL Server using the obsolete bind transaction feature. I have never used it and I wouldn't base new development on it. It also seems unnecessary here.
Can't you just have all the producers use the same connection and transaction? Put a lock around it. This obviously bottlenecks the process but it might still be fast enough.
You say you execute INSERT statements. For bulk inserts you can use the SqlBulkCopy class which is very much faster. Batch up the rows and only execute a bulk insert when you have >>1000 rows buffered.
I don't even see the need for producer/consumer here. It would indeed benefit performance by pipelining production with consumption but it also introduces far more complex threading. If you want to go this route you should probably give an IEnumerable<SqlDataRecord> to the SqlBulkCopy class to directly stream all rows that have been produced into it without intermediate buffering.
I have code that carries out data retrieval - basically executes anything from 3 to 12 SQL (oracle) read statements to retrieve data about an object.
Unfortunantly its running slowly (no SQL statement in particular, its just the fact I have so many of them - and they take around 0.2 seconds per statement, which can mean over 2 secs for the code to complete).
I am looking into ways of improving the performance. One way is to merge some of the tables into a single query (which can reduce the combined results by 0.5 secs). However it doesn't make sense to merge the rest since there will only be data there under certain cicumstances, and trying to determine when there is data there to marshal could get tricky.
I am considering introducing threading into my program, so after the initial query, I would spawn a thread for each of the other queries, so they are executed at the same time. However I have never used threading and am wary of introducing deadlocks or other pit falls.
Currently the other queries marshal the results into different sections of the SAME object. Would this cause any issues (i.e. since we are accessing/updating the same object in different threads though different sections/fields within the object?). Would it be better to return the results and marshal into the object after all the threads have finished?
I know these types of questions are hard to answer since its more general advice, but I would appreciate if anyone thought it was a good idea, or had other suggestions?
If you are doing only reading (select from) - don't worry about deadlocks. Oracle readings are not blockable (mostly). The biggest problem with threading queries to oracle would be how to deal with connections. To create connection, run a query and close connection - is very very very bad. Connections are expensive. They are also limited, so you don't want to create one million connections to execute your logic.
As a result, you would use some sort of connection pool and put your queries in a queue.
Also, I hope you are using bind variables and not string concatenation to pass queries to oracle.
In general, I would collect all the data (better in one query) and only then update the object. You could also consider to brake your object into it sections.
Threading workss perfectly. 2 years ago I did a project that used a multi strage / multi threading approeach to push data into a oracle database (and pull some data out of it for updates).
I basicallly used a staged approach (a request would go through multiple stages, get consumed there and new data be pusehd to the next stage) and every stage used a configurable thread pool, which would take a message, process it and post the new messages.
We used I think at that time close to 200 threads to process about a million SQL statements per minute (hitting an Oracle Exadata that was really getting some work out of that).
So, multithreading "just works" - obviously if you know how to do it and you have to get your architecture and the sql statements nice and non blocking. Databases in general are perfectly calable of handling multiple threads.
Now, for details: THAT DEPENDS.
Example:
Currently the other queries marshal the results into different
sections of the SAME object. Would this cause any issues (i.e. since
we are accessing/updating the same object in different threads though
different sections/fields within the object?)
Absolutely no problem as long as:
You make suer all updates are finished before moving the object to the next phase and
The updates do not overlap or have a cardinality (1 must finish for 2 to have the required data).
These are implementation details and it is really hard to make a generic answer for those (totally impossible). Especially as this is multi threading 101 - and has nothing to do with any database access.
In general - you will also have to tune the number of threads. .NET can not do that itself - as it will see the CPU not busy and spawn up more threads, even if the database server is the bottleneck. This is why we went with multiple stages - so we could tune the number of threads depending what they do (and the last stage used bulk inserting to insert the aggregated data into temporary staging tables with a small number of threads, moving a lot of data in every statement - this will require some tuning possibilities to not totally overload the database side).
A while ago, I wrote an application used by multiple users to handle trades creation.
I haven't done development for some time now, and I can't remember how I managed the concurrency between the users. Thus, I'm seeking some advice in terms of design.
The original application had the following characteristics:
One heavy client per user.
A single database.
Access to the database for each user to insert/update/delete trades.
A grid in the application reflecting the trades table. That grid being updated each time someone changes a deal.
I am using WPF.
Here's what I'm wondering:
Am I correct in thinking that I shouldn't care about the connection to the database for each application? Considering that there is a singleton in each, I would expect one connection per client with no issue.
How can I go about preventing the concurrency of the accesses? I guess I should lock when modifying the data, however don't remember how to.
How do I set up the grid to automatically update whenever my database is updated (by another user, for example)?
Thank you in advance for your help!
Consider leveraging Connection Pooling to reduce # of connections. See: http://msdn.microsoft.com/en-us/library/8xx3tyca.aspx
lock as late as possible and release as soon as possible to maximize concurrency. You can use TransactionScope (see: http://msdn.microsoft.com/en-us/library/system.transactions.transactionscope.aspx and http://blogs.msdn.com/b/dbrowne/archive/2010/05/21/using-new-transactionscope-considered-harmful.aspx) if you have multiple db actions that need to go together to manage consistency or just handle them in DB stored proc. Keep your query simple. Follow the following tips to understand how locking work and how to reduce resource contention and deadlock: http://www.devx.com/gethelpon/10MinuteSolution/16488
I am not sure other db, but for SQL, you can use SQL Dependency, see http://msdn.microsoft.com/en-us/library/a52dhwx7(v=vs.80).aspx
Concurrency is usually granted by the DBMS using locks. Locks are a type of semaphore that grant the exclusive lock to a certain resource and allow other accesses to be restricted or queued (only restricted in the case you use uncommited reads).
The number of connections itself does not pose a problem while you are not reaching heights where you might touch on the max_connections setting of your DBMS. Otherwise, you might get a problem connecting to it for maintenance purposes or for shutting it down.
DBMSes usually use a concept of either table locks (MyISAM) or row locks (InnoDB, most other DBMSes). The type of lock determines the volume of the lock. Table locks can be very fast but are usually considered inferior to row level locks.
Row level locks occur inside a transaction (implicit or explicit). When manually starting a transaction, you begin your transaction scope. Until you manually close the transaction scope, all changes you make will be attributes to this exact transaction. The changes you make will also obey the ACID paradigm.
Transaction scope and how to use it is a topic far too long for this platform, if you want, I can post some links that carry more information on this topic.
For the automatic updates, most databases support some kind of trigger mechanism, which is code that is run at specific actions on the database (for instance the creation of a new record or the change of a record). You could post your code inside this trigger. However, you should only inform a recieving application of the changes, not really "do" the changes from the trigger, even if the language might make it possible. Remember that the action which triggered the code is suspended until you finish with your trigger code. This means that a lean trigger is best, if it is needed at all.
Here I am dealing with a database containing tens of millions of records. I have an application which connects to the database, gets all the data from a single column in a table and does some operation on it and updates it (for SQL Server - using cursors).
For millions of records it is taking very very ... long time to update. So I want to make it faster by
using multiple threads with an independent connection for each thread.
or
by using a single connection throughout all the threads to fire the update queries.
Which one is faster, or if you have any other ideas plz explain.
I need a solution which is independent of database type , or even if you know specific solutions for each type of db, please reply.
The speedup you're trying to achieve won't work. To the contrary, it will slow down the overall processing as the database now has also to keep multiple connections/sessions/transactions in sync.
Keep with as few connections/transactions as possible for repetitive and comparable operations.
If it takes too long for your taste, maybe try to analyze if the queries can be optimized somehow. Also have a look at database-specific extensions (ie bulk operations) suitable for your problem.
All depends on the database, and the hardware it is running on.
If the database can make use of concurrent processing, and avoids contention on shared resources (e.g. page base locks would span multiple records, record based would not). Shared resources in this case include hardware, a single core box will not be able to execute multiple CPU intensive activities (e.g. parsing SQL) truely in parallel.
Network latency is something you might help alleviate with concurrent inserts even if the database is itself not able to exploit concurrency.
As with any question of performance there is substitute for testing in your specific scenario.
If possible try to use the Stored procedure the do all the processing and update the records.