"Timeout expired" exception on code exclusively using using statements - c#

I have a multi-threaded application that talks to SQL server via Linq to Sql. The app is running fine on a quad core (Intel I-7) machine when the number of threads is artificially kept at 8:
Parallel.ForEach(allIds,
new ParallelOptions { MaxDegreeOfParallelism = 8 },
x => DoTheWork(x));
When the number of threads is left to the system to decide:
Parallel.ForEach(allIds, x => DoTheWork(x));
After running for a little while, I get the following exception:
Timeout expired. The timeout period elapsed prior to obtaining a
connection from the pool. This may have occurred because all pooled
connections were in use and max pool size was reached.
There are only two patterns in my app for calling SQL:
first:
using (var dc = new MyDataContext())
{
//do stuff
dc.SafeSubmitChanges();
}
second:
using (var dc = new MyDataContext())
{
//do some other stuff
DoStuff(dc);
}
.....
private void DoStuff(DataContext dc)
{
//do stuff
dc.SafeSubmitChanges();
}
I decided to throttle the calls by this form of logic:
public static class DataContextExtention
{
public const int SQL_WAIT_PERIOD = 5000;
public static void SafeSubmitChanges(this DataContext dc)
{
try
{
dc.SubmitChanges();
}
catch (Exception e)
{
if (e.Message ==
"Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.")
{
System.Data.SqlClient.SqlConnection.ClearAllPools();
System.Threading.Thread.Sleep(SQL_WAIT_PERIOD);
dc.SafeSubmitChanges();
}
else
{
throw;
}
}
}
}
This made absolutely no difference. Once the app throws the first exception of this kind, all sorts of random places in the app (even lines of code that have nothing to do with SQL server) start throwing this exception.
Q1: Isn't religiously employing using statement supposed to guard against exactly this scenario?
Q2: What is wrong and how do I fix this?
Note: There are approx 250,000 ids. I also tested at MaxDegreeOfParallelism = 16 and I get the same exception.

I suppose it depends on how many items there are in allIds. If Parallel.ForEach creates too many parallel concurrent tasks, it could be that each one tries to open connection to the database (in parallel) and thus exhausting connection pool and leaving it unable to provide connections to all concurrent tasks that are requesting new connections.
If satisfying the connection pool request takes longer than timeout, that error message would make sense. So when you set MaxDegreeOfParallelism = 8, you have no more than 8 concurrent tasks, and thus no more than 8 connections "checked out" from the pool. Before the task completes (and Parallel.ForEach now has an available slot to run new task) the connection is returned back to the pool, so that when Parallel.ForEach runs the next item, connection pool can satisfy the next request for the connection, and thus you don't experience the issue when you artificially limit concurrency.
EDIT 1
#hatched's suggestion above is on the right track - increase the pool size. However, there is a caveat. Your bottleneck likely isn't really in computing power, but in database activity. What I suspect (speculation, admittedly) is happening is that while talking to the database, the thread can't do much and goes blocked (or switches to another task). So thread pool sees that there are more tasks pending, but CPU is not utilized (because of outstanding IO operations), and thus decides to take on more tasks for the available CPU slack. This of course just saturates the bottleneck even more and back to square one. So even if you increase the connection pool size, you're likely to keep running into the wall until your pool size is as big as your task list. As such, you may actually want to have bounded parallelism such that it never exhausts thread pool (and fine tune by making thread pool larger / smaller depending on DB load, etc.).
One way to try to find out if the above is true is to see why connections are taking so long and not getting returned to the pool. I.e. analyze to see if there is db contention that is slowing all connections down. If so, more parallelization won't do you any good (in-fact, that would be making things worse).

I was thinking the following might help, in my experience with Oracle the DB Connection Pool has caused me issues before. So I thought there may be similar issue with SQL Server connection pool. Sometimes knowing the default connection settings and seeing connection activity on the DB is good information.
If you are using Sql Server 8 the default SQL Connection Pool is 100. The default Timeout is 15 seconds. I would want to have the SQL Admin track how many connections your making while running the app and see if your putting load on the DB Server. Maybe add some performance counters as well. Since this looks like a SQL Server exception I would gets some metrics to see what is happening. You could also use intellitrace to help see DB Activity.
Intellitrace Link: http://www.dotnetcurry.com/showarticle.aspx?ID=943
Sql Server 2008 Connection Pool Link: http://msdn.microsoft.com/en-us/library/8xx3tyca(v=vs.110).aspx
Performance Counters Link: http://msdn.microsoft.com/en-us/library/ms254503(v=vs.110).aspx

I could be way off target here, but I wonder if the problem is not being caused as a side effect of this fact about connection pooling (Taken from here, emphasis mine):
When connection pooling is enabled, and if a timeout error or other login error occurs, an exception will be thrown and subsequent connection attempts will fail for the next five seconds, the "blocking period". If the application attempts to connect within the blocking period, the first exception will be thrown again. Subsequent failures after a blocking period ends will result in a new blocking periods that is twice as long as the previous blocking period, up to a maximum of one minute.
So in other words, it's not that you are running out of connections per se, it's that something is failing on one or more of the parallel operations, perhaps because the poor table is caving under the pressure of parallel writes - have you profiled what's happening database-side to see if there are any problems with contention on the table during the operation?
This could then cause other requests for connections to start to back up due to the "penalty" described above; hence the exceptions and once you start to get one, your SafeSubmit method can only ever make things worse because it keeps retrying an already banjaxed operation.
This explanation would also heavily support the idea that the real bottleneck here is the database and that maybe it's not a good idea to try to hammer a table with unbounded parallel IO; its better to measure and come up with a maximum DOP based on the characteristics of what the database can bear (which could well be different for different hardware)
Also, as regards your first question, using only guarantees that your DataContext object will be auto-magically Dispose()d when it goes out of scope, so it's not at all designed to protect in this scenario - all it is is syntactic sugar for
try
{
var dc = new DataContext();
//do stuff with dc
}
finally
{
dc.Dispose();
}
and in this case that's not a guard against there being (too) many DataContexts currently trying to connect to the database at the same time.

Are you sure you are not facing connection leaks? Please check out the accepted answer at this link
Moreover, do you have already set MultipleActiveResultSets = true ?
From MSDN:
When true, an application can maintain multiple active result sets
(MARS). When false, an application must process or cancel all result
sets from one batch before it can execute any other batch on that
connection. Recognized values are true and false.
For more information, see Multiple Active Result Sets (MARS).

Related

Async-Await api performance bottleneck at DB in .net5.0 web api

.net5.0 Web API
As there were few API's which were extensively being used by the clients, all over performance was being deteriorated. So we decided to convert those selected API's to Async/Await to reduce the load at IIS.
After implementing the same, we got better performances at about 100 parallel requests (via jMeter) in our local environment. But as soon as we increased the load to 200 requests, it started giving the below error: "The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached."
We realized that we didn't exactly improve the performance but we shifted the bottleneck from IIS to DB.
To solve this we tried changing the connection string (My SQL Server) properties i.e Max Pool size, ConnectRetryCount, ConnectRetryInterval etc. which definitely worked and gave us better results but everything came with a trade-off. Increasing Max Pool size would utilize DB server resources.
Also, we can never predict how many parallel requests would come to our API i.e if my max pool size is 400, what if 450 parallel requests arrives, it would still break.
We identified few options like using SemaphoreSlime, to limit the number of requests reaching the DB which would not open too many connections and hence restricting the timeout errors. But here we are not able to utilize the async nature of my api to its fullest.
Is there a optimized solution to this, or are we missing something?
Also, how safe is SemaphoreSlime if we choose to use it?
IIS and Kestrel can be configured to limit the maximum number of connections, so why to use your own home made solution?
You would can achieve same goal by increasing the connection timeout instead of using SemaphoreSlim.
If you want to increase the throughput capacity of you app you should start optimizing your queries, if this is not enough you should consider increasing the database server hardware resources.
If you have 400 max pool size and 450 concurrent requests arrive, it would not necessarily break. If there is no available connections in the pool, Connection.OpenAsync will wait until a connection is available. So if the queries are fast enough, it will not time out.
Here are two suggestions that could improve the performance of the system:
Make the work shorter so that there is less parallel work:
Use a stored procedure that does the work locally rather that moving data between the api and the db
Scale the database up,
Optimize the queries
Confirm indexes
Use caching
etc.
Accept the requests but put them on a queue and make the consumer of the queue process them in batches rather than one by one.

Parallel programming for Windows Service

I have a Windows Service that has code similar to the following:
List<Buyer>() buyers = GetBuyers();
var results = new List<Result();
Parallel.Foreach(buyers, buyer =>
{
// do some prep work, log some data, etc.
// call out to an external service that can take up to 15 seconds each to return
results.Add(Bid(buyer));
}
// Parallel foreach must have completed by the time this code executes
foreach (var result in results)
{
// do some work
}
This is all fine and good and it works, but I think we're suffering from a scalability issue. We average 20-30 inbound connections per minute and each of those connections fire this code. The "buyers" collection for each of those inbound connections can have from 1-15 buyers in it. Occasionally our inbound connection count sees a spike to 100+ connections per minute and our server grinds to a halt.
CPU usage is only around 50% on each server (two load balanced 8 core servers) but the thread count continues to rise (spiking up to 350 threads on the process) and our response time for each inbound connection goes from 3-4 seconds to 1.5-2 minutes.
I suspect the above code is responsible for our scalability problems. Given this usage scenario (parallelism for I/O operations) on a Windows Service (no UI), is Parallel.ForEach the best approach? I don't have a lot of experience with async programming and am looking forward to using this opportunity to learn more about it, figured I'd start here to get some community advice to supplement what I've been able to find on Google.
Parallel.Foreach has a terrible design flaw. It is prone to consume all available thread-pool resources over time. The number of threads that it will spawn is literally unlimited. You can get up to 2 new ones per second driven by heuristics that nobody understands. The CoreCLR has a hill climbing algorithm built into it that just doesn't work.
call out to an external service
Probably, you should find out what's the right degree of parallelism calling that service. You need to find out by testing different amounts.
Then, you need to restrict Parallel.Foreach to only spawn as many threads as you want at a maximum. You can do that using a fixed concurrency TaskScheduler.
Or, you change this to use async IO and use SemaphoreSlim.WaitAsync. That way no threads are blocked. The pool exhaustion is solved by that and the overloading of the external service as well.

Processing loop restricted by a connection pool (?)

I'm looking for some strategic help here, since I am new to TPL.
Situation
I have an application that coordinates data between 2 disparate LOB systems, ones that do not talk to each other. So, it looks a bit like:
[ System 1 ] < ----- [ App ] ----- > [ System 2 ]
During its processing, the app performs the following tasks:
App creates a connection to System 1. This connection must screen-scrape a web application, so it uses a and System 2, verifying each one is available.
App requests list of IDs from System A.
This list is run through, item by item. Processing that list:
App requests data from System 1. This system does not provide any service interface, so the app uses a WebRequest to both GET and POST requests to System 1. In addition to web page data scraped, a file may also be downloaded.
With data from System 1, App submits data to System 2 via several web service calls. Several calls may be made, and a file may be uploaded.
There are often tens of thousands of items in the loop. There is no dependency between these items, so they seem to be a good candidate for Task-based processing.
However, at most, there can be about 20 connections to System 1 and about 10 connections to System 2. So, the simple idea of just creating and destroying sessions for each item in the loop (like you might do in a simple Parallel.ForEach Task) would be prohibitively costly. Rather, I want to share the connections, in effect, creating a connection pool of sorts. That pool would be created before the tasks started up. When each Task starts its work, it would basically wait until it could get a connection from the pool. Once the task is complete, the connection would be released, and another Task could get ahold of it. In this case, the Scheduler limit is not just the CPUs; it's also the maximum number of connections to System 2.
Desire
I'm looking for the approach. I don't mind doing the work to figure out the implementation, but I need the best strategic approach.
How do I get the task loop to work with a limited number of these connections? Or do I have to go back to the old style of Thread allocation, and just manually pass the freed up connections as the threads complete their tasks? Some kind of mutex array? If so, how will the Tasks grab an open connection? Some type of concurrent bag or am I just going the wrong way?
Any help would be greatly appreciated.
I think a BlockingCollection for each connection pool will work well. If a thread attempts to get a connection from an empty pool, that thread will be blocked until another thread returns a connection to the pool.
You should also set MaxDegreeOfParallelism to the size of the bigger pool, to make sure there aren't unnecessarily many threads, most of them waiting to get a connection from the pool.
With that, your code could look like this:
var connection = serviceAConnections.Take();
// use the connection
serviceAConnections.Add(connection);
But a better approach might be to add a level of abstraction over that:
using (var connectionHolder = serviceAConnections.Get())
{
var connection = connectionHolder.Connection;
// use the connection
}

Halt Linq query if it will take 'too' long

Currently I have the need to create a reporting program that runs reports on many different tables within a SQL database. Multiple different clients require this functionality but some clients have larger databases than others. What I would like to know is whether it is possible to halt a query after a period of time if it has been taking 'too' long.
To give some context, some clients have tables with in excess of 2 million rows, although a different client may have only 50k rows in the same table. I want to be able to run the query for say 20 seconds and if it has not finished by then return a message to the user to say that the result set will be too large and the report needs to be generated outside of hours as we do not want to run resource intensive operations during the day.
Set the connection timeout on either your connection string or on the DataContext via the CommandTimeoutproperty. When the timeout expires, you will get a TimeoutException, and your query will be cancelled.
You cannot be sure that the query is cancelled on the server the very instant the timeout occurs, but in most cases it will cancel rather quickly. For details read the excellent article "There's no such thing as a query timeout...". The important part from there is:
A client signals a query timeout to the server using an attention
event. An attention event is simply a distinct type of TDS packet a
SQL Server client can send to it. In addition to connect/disconnect,
T-SQL batch, and RPC events, a client can signal an attention to the
server. An attention tells the server to cancel the connection's
currently executing query (if there is one) as soon as possible. An
attention doesn't rollback open transactions, and it doesn't stop the
currently executing query on a dime -- the server aborts whatever it
was doing for the connection at the next available opportunity.
Usually, this happens pretty quickly, but not always.
But remember, it will differ from provider to provider and it might even be subject to change between server versions.
You can do that easily if you run the quer on a background thread. Make the main thread start a timer and spawn a background thread that runs the query. If when 20 seconds are over the background thread hasn't returned a result, the main thread can cancel it.

Why so many sp_resetconnections for C# connection pooling?

We have a web service coded in C# that makes many calls to MS SQL Server 2005 database. The code uses Using blocks combined with C#'s connection pooling.
During a SQL trace, we saw many, many calls to "sp_resetconnection". Most of these are short < 0.5 sec, however sometimes we get calls lasting as much as 9 seconds.
From what I've read sp_resetconnection is related to connection pooling and basically resets the state of an open connection. My questions:
Why does an open connection need its state reset?
Why so many of these calls!
What could cause a call to sp_reset connection to take a non-trivial amount of time.
This is quite the mystery to me, and I appreciate any and all help!
The reset simply resets things so that you don't have to reconnect to reset them. It wipes the connection clean of things like SET or USE operations so each query has a clean slate.
The connection is still being reused. Here's an extensive list:
sp_reset_connection resets the following aspects of a connection:
It resets all error states and numbers (like ##error)
It stops all EC's (execution contexts) that are child threads of a parent EC executing a parallel query
It will wait for any outstanding I/O operations that is outstanding
It will free any held buffers on the server by the connection
It will unlock any buffer resources that are used by the connection
It will release all memory allocated owned by the connection
It will clear any work or temporary tables that are created by the connection
It will kill all global cursors owned by the connection
It will close any open SQL-XML handles that are open
It will delete any open SQL-XML related work tables
It will close all system tables
It will close all user tables
It will drop all temporary objects
It will abort open transactions
It will defect from a distributed transaction when enlisted
It will decrement the reference count for users in current database; which release shared database lock
It will free acquired locks
It will releases any handles that may have been acquired
It will reset all SET options to the default values
It will reset the ##rowcount value
It will reset the ##identity value
It will reset any session level trace options using dbcc traceon()
sp_reset_connection will NOT reset:
Security context, which is why connection pooling matches connections based on the exact connection string
If you entered an application role using sp_setapprole, since application roles can not be reverted
The transaction isolation level(!)
Here's an explanation of What does sp_reset_connection do? which says, in part "Data access API's layers like ODBC, OLE-DB and SqlClient call the (internal) stored procedure sp_reset_connection when re-using a connection from a connection pool. It does this to reset the state of the connection before it gets re-used." Then it gives some specifics of what that system sproc does. It's a good thing.
sp_resetconnection will get called everytime you request a new connection from a pool.
It has to do this since the pool cannot guarantee the user (you, the programmer probably :)
have left the connection in a proper state. e.g. Returning an old connection with uncommited transactions would be ..bad.
The nr of calls should be related to the nr of times you fetch a new connection.
As for some calls taking non-trivial amount of time, I'm not sure. Could be the server is just very busy processing other stuff at that time. Could be network delays.
Basically the calls are the clear out state information. If you have ANY open DataReaders it will take a LOT longer to occur. This is because your DataReaders are only holding a single row, but could pull more rows. They each have to be cleared as well before the reset can proceed. So make sure you have everything in using() statements and are not leaving things open in some of your statements.
How many total connections do you have running when this happens?
If you have a max of 5 and you hit all 5 then calling a reset will block - and it will appear to take a long time. It really is not, it is just blocked waiting on a pooled connection to become available.
Also if you are running on SQL Express you can get blocked due to threading requirements very easily (could also happen in full SQL Server, but much less likely).
What happens if you turn off connection pooling?

Categories