I inherited a ton of C# dotnet code that apparently in some place is not closing the connection or something to sql server and I get
The timeout period elapsed prior to obtaining a connection from the pool: This may have occurred because all pooled connections were in use and max pool size was reached
This happens quite a bit and usually at the same time but I don't know of anything that is scheduled. Also searching manually every little place in code is quite not possible. Any ideas how to fix this without manually going over every single function in the code?
The problem with not wanting to go through the code is that it's very likely that it's not a single instance or area and may depend on where you're getting the most traffic in the production environment. You could have a fairly innocuous looking query that could be improved slightly with a .AsNoTracking() or a better join/include rather than a single big query that is poorly optimized. There are also instances where the entire design is flawed (I came into a project once where sql connections were attached to individual classes, so there was no real way to wrap the connection in a using statement. We had to restructure all of it.
If you're using custom transactions or sql connection requests, make sure you're wrapping with a using and a try catch finally with a dispose. Relying on the garbage collector is not always great since there's not guarantee the garbage collector whill actually dispose immediately at the end of a using statement.
using (var conn = new SqlConnection())
{
try
{
// Transactions, SQL Commands, etc
}
catch (SqlException ex)
{
// Trans rollback, error handling
}
finally
{
// Tell the garbage collector to clean up this resource now.
conn.Dispose();
}
}
If you're strictly using a database context, some statements can be optimized marginally by using .AsNoTracking() for instances where you are only retrieving a result-set but have no intention of modifying the results from that query. Something like the following:
var tier = _dbContext.Tier.AsNoTracking()
.Include(t => t.NameSchema)
.Include(t => t.TierPackageGroups)
.Where(t => t.TierNumber == tierNumber).FirstOrDefault();
Make sure you're using joins or .Include wherever possible as opposed to lazy loading.
It's not the answer that you want to hear, and I've been in your shoes, but most likely the best solution is to go through the code. You could always start by cleaning up a few areas, gauging impact, a few more areas, gauging impact, etc. This sort of creep investigation can help narrow your search as you see improvements in some areas or don't see some in others.
If you're convinced it has to be a single query that you want to find, then there are tools for evaluating sql performance, but I usually pull in a DBA to handle that so I'm not familiar with those tools. If you're using Azure there is a panel where you can see the queries and have them organized by performance impact. It's not going to point you to the code, but it could give you a hint.
Since the question is specifically for a solution that doesn't require searching for code, the one other answer that will work is to just throw more hardware at the problem, but this a poor stop-gap that can be costly and ultimately just be delaying the problem or only solving for the problem 99% of the time (until you hit a peak traffic period, for instance). One client we had a few years back has probably spent more than $50k in additional hosting costs when they could've had us fix the issue for under $5k at the time of the initial request.
Related
How many concurrent statements does C# SqlConnection support?
Let's say I am working on Windows service running 10 threads. All threads use the same SqlConnection object but different SqlCommand object and perform operations like select, insert, update and delete on either different tables or same table but different data. Will it work? Will a single SqlConnection object be able to handle 10 simultaneous statements?
How many concurrent statements does C# SqlConnection support?
You can technically have multiple "in-flight" statements, but only one acutally executing.
A single SqlConnection maps to a single Connection and Session in SQL Server. In Sql Server a Session can only have a single request active at-a-time. If you enable MultipeActiveResultsets you can start a new query before the previous one is finished, but the statements are interleaved, never run in parallel.
MARS enables the interleaved execution of multiple requests within a
single connection. That is, it allows a batch to run, and within its
execution, it allows other requests to execute. Note, however, that
MARS is defined in terms of interleaving, not in terms of parallel
execution.
And
execution can only be switched at well defined points.
https://learn.microsoft.com/en-us/sql/relational-databases/native-client/features/using-multiple-active-result-sets-mars?view=sql-server-ver15
So you can't even guarantee that another statement will run whenever one becomes blocked. So if you want to run statements in parallel, you need to use multiple SqlConnections.
Note also that a single query might use a parallel execution plan, and have multiple tasks running in parallel.
David Browne gave you the answer the ask, but there might be something else you need to know:
Let's say I am working on Windows service running 10 threads. All threads use the same SqlConnection object but different SqlCommand object and perform operations like select, insert, update and delete on either different tables or same table but different data.
This design just seems wrong on several fronts:
You keep a disposeable resource around and open. My rule for Disposeable stuff is: "Create. Use. Dispose. All in the same piece of code, ideally using a using block." Keeping disposeable stuff around or even sharing it between threads is jsut not worth the danger of forgetting to close it.
There is no performance advantage: SqlConnection uses internall connection pooling without any side effects. And even if there is a relevant speed advantage, they would not be worth the dangers.
You are using Mutltithreading with Database Access. Multithreading is one way to implement multitasking, but not one you should use until you need it. Multithreading is only usefull with CPU bound work. Otherweise you should generally be using async/await or similar appraoches. DB Operations are either disk or network bound.
There is one exception to this rule, and that is if your application is a Server. Servers are teh rare example of something being pleasingly parallel. So having a large Threadpool to process incomming requests in paralell is very common. It is rather rare that you write one of those, however. Mostly you just run your code in a existing server infrastructure that deals with that.
If you do have heavy CPU work, chances are you are retreiving to much. It is a very common beginners mistake to retreive a lot, then do filtering in C# code. Do not do that. Do as much filtering and processing as possible in the Query. You will not be able to beat the speed of the DB-Server, and at best you tie up your network pointlessly.
We are experiencing an issue where several hundred threads are trying to update a table ID, similar to this post, and sometimes encountering errors such as:
Cannot insert duplicate key in object dbo.theTable. The duplicate
key value is (100186).
The method that is being executed hundreds of times in parallel executes several stored procedures:
using (var createTempTableCommand = new SqlCommand())
{
createTempTableCommand.CommandText = createTempTableScript;
createTempTableCommand.Connection = omniaConnection;
createTempTableCommand.ExecuteNonQuery();
}
foreach (var command in listOfSqlCommands)
{
using (var da = new SqlDataAdapter(command))
{
da.Fill(dtResults);
}
}
In order to recreate such an environment/scenario, is it advisable to simply record a trace and then simply replay it?
How do we recreate an environment with high concurrency?
You can avoid all deadlocks/dirty read only when you will rewrite your solution into sequencial instead of paralel.
You can accept some error and create appropriate error handling. Blocked or wrong run with duplicate key can be started again.
You can try rewrite your solution without touching the same rows with more thread at the same time. You have to change your transaction isolation level (https://msdn.microsoft.com/en-us/library/ms709374(v=vs.85).aspx), change your locking to rowlocking (probably combination of ROWLOCK, UPDLOCK hints). This solution will minimalize your errors, but cannot handle all errors.
So I recommend 2. In some solutions is better way to run command without transation - you can handle it without blocking other threads and enforce relations in next step.
And for "similar post" - the same way. Error handling will be better in your app. Prevent to use cursor solutions like in similar post, because in goes against database fundamendals. Collect data into sets and use sets.
I don't think tracing is a good approach to reproducing a high concurrency environment, because the cost of tracing will itself skew the results and it's not really designed for that purpose. Playback won't necessarily be faithful to the timing of the incoming events.
I think you're better off creating specific load tests to hopefully exercise the problems you're encountering, rent some virtual machines and beat the heck out of a load-test db.
Having said that, tracing is a good way to discover what the actual workload is. Sometimes, you're not seeing all the activity that's coming against your database. Maybe there are some "oh yeah" jobs running when the particular problems present themselves. Hundreds of possibilities I'm afraid - and not something that can be readily diagnosed without a lot more clues.
I have a performance problem we have done a bunch of analysis and are stuck. Hopefully one of you have seen this before.
I'm calling DbContext.Database.SqlQuery the database portion takes 3ms but the full execution takes 9 seconds.
We've used EF Profiler to discover this and we also run the SQL directly in SQL Server Management Studio and it is instantaneous.
We also used glimpse and couldn't see deep enough into the process.
The result type is not an entity from the model and therefore we are confident that tracking is not involved.
We also know that this is not the first query executed against the context therefore we are not paying EF startup cost on this query.
We tried the .net profiler and had so many problems running it that we decided we should just ask.
Any tips on how to dig in and figure this out ?
EDIT: The result set for this query is 1 row with 4 columns (decimal)
The line of code is just:
var list=contextInstance.Database.SqlQuery<nonEntityType>(sqstring).ToList();
The SQL itself is not a very long string. We will use a more detailed profiler to find out where in the process this is getting hung up.
We've used EF profiler to discover this and we also run the SQL
directly in SQL server management studio and it is instantaneous.
This doesn't prove anything. The query might run fast, but the data might result in 100MB of data which is then transported to the client and materialized in objects. This might take more time than you think.
The query in SSMS might return instantaneous because it shows only part of the data. You didn't say what the data was.
Use a real .NET profiler, like dotTrace or Ants. This way you can see where time is lost exactly on the line. EF Prof (or my own ORM Profiler: http://www.ormprofiler.com) will tell you which part of the total route taken (ORM->DB->ORM) takes what time. Even EF prof does ;)
If the client for some reason can't use a profiler as Frans suggest you will have to play the guessing game and exclude possiblities.
First of all I think a critical piece of information is missing. Does it always take around 9 seconds or does it vary?
First step:
Decide if the delay is before or after the query hits the database. Should be possible to do either with EF profiler and looking at the timestamps in Sql profiler.
Either way you will have limited the possibilities a bit.
Second step:
Exclude as much as possible
Indexes (No, the query is fast)
Returning too much data (No, according to the info you have)
Slow query compilation (No, raw sql query is used)
Slow data transfer (No, the other queries works well)
Slow DbContext initialization (No, you said it's not the first query)
Row or table locks (Not likely, That would probably show up as a long running query in the profiler)
Slow materialization (No, to few fields unless there is a serious edge case bug)
Third step:
What's left? That depends on the answer to #1 and also if it's always 9 seconds.
My prime suspects here is either some connection issue because another call is blocking so it has to wait for a connection or some second level cache or something that doesn't work well with this query.
To exclude some more alternatives I would try to run the same query using plain old ADO.NET. If the problem persists you know it's not a EF problem and very likely a connection issue. If it goes away it could still be both issues though.
Not so much as an answer as some rants, but hopefully something you didn't think of already.
I am using old school ADO.net with C# so there is a lot of this kind of code. Is it better to make one function per query and open and close db each time, or run multiple queries with the same connection obect? Below is just one query for example purpose only.
using (SqlConnection connection = new SqlConnection(ConfigurationManager.ConnectionStrings["DBConnectMain"].ConnectionString))
{
// Add user to database, so they can't vote multiple times
string sql = " insert into PollRespondents (PollId, MemberId) values (#PollId, #MemberId)";
SqlCommand sqlCmd = new SqlCommand(sql, connection);
sqlCmd.Parameters.Add("#PollId", SqlDbType.Int);
sqlCmd.Parameters["#PollId"].Value = PollId;
sqlCmd.Parameters.Add("#MemberId", SqlDbType.Int);
sqlCmd.Parameters["#MemberId"].Value = Session["MemberId"];
try
{
connection.Open();
Int32 rowsAffected = (int)sqlCmd.ExecuteNonQuery();
}
catch (Exception ex)
{
//Console.WriteLine(ex.Message);
}
}
Well, you could measure; but as long as you are using the connections (so they are disposed even if you get an exception), and have pooling enabled (for SQL server it is enabled by default) it won't matter hugely; closing (or disposing) just returns the underlying connection to the pool. Both approaches work. Sorry, that doesn't help much ;p
Just don't keep an open connection while you do other lengthy non-db work. Close it and re-open it; you may actually get the same underlying connection back, but somebody else (another thread) might have made use of it while you weren't.
For most cases, opening and closing a connection per query is the way to go (as Chris Lively pointed out). However, There are some cases where you'll run into performance bottlenecks with this solution though.
For example, when dealing with very large volumes of relatively quick to execute queries that are dependent on previous results, I might suggest executing multiple queries in a single connection. You might encounter this when doing batch processing of data, or data massaging for reporting purposes.
Always be sure to use the 'using' wrapper to avoid mem leaks though, regardless of which pattern you follow.
If the methods are structured such that a single command is executed within a single method, then Yes: instantiate and dispose of the connection for each command.
If the methods are structured such that you have multiple commands executed in the same block of code, then the outer block needs to be the using clause for the connection.
ADO is very good about connection pooling so instantiating and disposing of the command object is going to be extremely fast and really won't impact performance.
As an example, we have a few pages that will execute update to 50 queries in order to compose the page. Because there is branching code to determine the queries to run, we have each of them wrapped with their own using (connection...) clauses.
We once ripped those out and grabbed one connection object and passed it to the individual methods. This had exactly zero performance improvement while complicating the hell out of the code with all the exception clauses every where to ensure the connection was properly disposed at the end. At the end of the test, we rolled back the code to how it was before. Much cleaner to know exactly what was going on and when a connection was being used.
Well, as always, it depends. If you have 5 database call to make within the same method call, you should probably use a single connection.
However, holding onto connection while nothing is happening isn't usually advised from a scalability standpoint.
ADO.NET is old school now? Wow, you just made me feel old. To me Rogue Wave ODBC using Borland C++ on Windows 3.1 is old school.
To answer, in general you want to understand how your data drivers work. Understand such concepts as connection pooling and learn to profile the transaction costs associate with connecting / disconnecting and executing queries. Then take that knowledge and apply it it your situation.
We currently have a little situation on our hands - it seems that someone, somewhere forgot to close the connection in code. Result is that the pool of connections is relatively quickly exhausted. As a temporary patch we added Max Pool Size = 500; to our connection string on web service, and recycle pool when all connections are spent, until we figure this out.
So far we have done this:
SELECT SPId
FROM MASTER..SysProcesses
WHERE DBId = DB_ID('MyDb') and last_batch < DATEADD(MINUTE, -15, GETDATE())
to get SPID's that aren't used for 15 minutes. We're now trying to get the query that was executed last using that SPID with:
DBCC INPUTBUFFER(61)
but the queries displayed are various, meaning either something on base level regarding connection manipulation was broken, or our deduction is erroneous...
Is there an error in our thinking here? Does the DBCC / sysprocesses give results we're expecting or is there some side-effect catch? (for example, connections in pool influence?)
(please, stick to what we could find out using SQL since the guys that did the code are many and not all present right now)
I would expect that there is a myriad of different queries 'remembered' by inputbuffer - depending on the timing of your failure and the variety of queries you run, it seems unlikely that you'd see consistent queries in this way. Recall that the connections will eventually be closed, but only when they're GC'd and finalized.
As Mitch suggests, you need to scour your source for connection-opens and ensure they're localized and wrapped in a using(). Also look for possibly-long-lived objects that might be holding on to connections. In an early version of our catalog ASP page objects held connections that weren't managed properly.
To narrow it down, can you monitor connection-counts (perfmon) as you focus on specific portions of your app? Does it happen more in CRUD areas vs. reporting or other queries? That might help narrow down the source-scour you need to do.
Are you able to change the connection strings to contain information about where and why the connection was created in the Application field?