Entity Framework : timeout expired exceptions occurring during peak API activity - c#

In an Open Shift cluster environment, I am having processing time periods where we are getting various timeout expired exceptions from Entity Framework Core v2.1. I am in the process of upgrading my app to .NET Core 6.0. The exceptions occur on queries to SQL Server and appear to not be related to memory or processing resources. Seems more likely that it is load on the network where we may be having network connection timeouts to the SQL Server.
Some of the exceptions also include the following case:
System.Data.SqlClient.SqlException (0x80131904): Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
System.ComponentModel.Win32Exception (258): Unknown error 258
Looking at next steps, I have read some about Entity Framework interceptors. Would using Entity Framework interceptors be a good use of this feature for debugging timeout expired issues?
Could I use an interceptor to catch the timeout exception details, better inside the Entity Framework DLL?
Looking for best practices for pin pointing causes of DB connection timeouts occurring under peak loads, where simple DB queries are timing out.

Related

Can not open even one connection: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool

I use Entity Framework 6.4.4 in my multi-threads server application to connect to a SQL Server database on another server on the internet.
Before using DbContext, I open the connection by calling MyDbContext.Database.Connection.Open(). Multiple threads may try to open the connection at the same time. But sometimes I get this exception of type InvalidOperationException with Message: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
I am sure I close all the connections after being used, and no connection remains open for more than about 2 seconds. And also, I am sure there are not many simultaneous connections.
The SQL Server can support about 1500 connections at the same time (I have tested), but I think the problem is with opening a connection not having many opened connections.
I tested my application on a server with a CPU of 40 logical processors. It works fine. But when I move my application to a server with 4 logical processors, it works correctly for a while, but it can not open even one single connection after a period. I limited the number of threads tries to open the connections simultaneously to even 3. Still, it didn't help, and I get that exception continuously. CPU usage is consistently below 50%, and there is free memory.

DbContext pooling in Azure Functions

I have an Azure Function running on a consumption plan. When the function is under heavy load, I've been getting System.InvalidOperationException with the message The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
I'm using dependency injection, and so far I've been injecting my Entity Framework Core DbContext by using AddDbContextPool. Is DbContext pooling recommended for Azure Functions, or should I rather use AddDbContext?
The connection string to my SQL Server only specifies the server and authentication, meaning that connection pooling should also be enabled by default. Is connection pooling also recommended for Azure Functions?
Apparently, AddDbContextPool is not the same as connection pooling. Instead, it's a pool of dbcontexts: https://stackoverflow.com/a/48444206/5099097
Also, according to https://learn.microsoft.com/en-us/azure/azure-functions/manage-connections#sqlclient-connections, EF and EF Core implement connection pooling by default since they use ADO.NET and that's the default for ADO.NET.
Your function code can use the .NET Framework Data Provider for SQL
Server (SqlClient) to make connections to a SQL relational database.
This is also the underlying provider for data frameworks that rely on
ADO.NET, such as Entity Framework. Unlike HttpClient and
DocumentClient connections, ADO.NET implements connection pooling by
default. But because you can still run out of connections, you should
optimize connections to the database. For more information, see SQL
Server Connection Pooling (ADO.NET).
I think your best bet is to dial down the concurrency like you already said in the comments, but in addition to that I think it's important to note that connection pooling is managed on the client side (Azure Func in this case): SQL Server: where is connection pool: on .net side or server side. So while your func app will be able to take advantage of connection pooling, each instance of the func app will have its own connection pool as it scales out. So the scalability benefits of connection pooling aren't as great as if it were one client side app managing a single connection pool.
Therefore, to get greater benefits from connection pooling per instance of your func app you should do more work per Service Bus trigger. For example, batching several queries together under the same trigger instead of 1 query per trigger. Also, if you're doing writes in other triggers, you can batch several update/insert operations together in 1 func app trigger. According to this, 42 non-query operations is the optimal size per batch in EF Core: https://learn.microsoft.com/en-us/ef/core/performance/efficient-updating#batching
But even better than that is using table value parameters for making bulk updates to hundreds/thousands of records at a time. After I made these changes my errors from hitting the connection limit went away.

Transient Fault Handling with Azure database

We are using ReliableSqlConnection from Microsoft.Practices.EnterpriseLibrary.TransientFaultHandling to support transient fault handling for Azure database, we are on .NET 4.6.1. The thing is randomly we got exception:
"Internal .Net Framework Data Provider error 6"
when trying to open connection with this class. ReliableSqlConnection does not support async await, but built-in transient fault handling with ConnectRetryInterval and ConnectRetryCount on ADO.NET 4.6.1 only support at connection level, not command level.
Is there any alternative for ReliableSqlConnection in order to supportat command level with async await and avoid the above exception.
We use Elastic Pool on Azure SQL with tier standard.

EF + Multiple transaction, Multiple Context

I have been trying everything to make this situation working, but unable so far
Entity Framework
XUnit (testing library)
2 DbContext (2 differents databases, 2 connections strings)
Situation: Run integration test with AutoRollBack feature (The AutoRollback feature manily wrap the code in a Parent transaction which is rolled back at the end of the test)
The Test looks like:
[AutoRollBack]
Test(){
Operation1 against DB1
Operation2 against Db2
}
I enabled MSDTC on both SQL Servers, used the DTCPing tool to confirm that the communication is ok between them
I Enabled Distributed Transaction in Inbound and Outbound Firewall in both servers
I Added distributed Transaction in Allowed Programs in Firewalls of both servers
Both servers can ping each other using Netbios name
But the 2nd operation in the Test will always return "the Underlying provider failed on Open"
The MSDTC transaction manager was unable to pull the transaction from the source transaction manager due to communication problems. Possible causes are: a firewall is present and it doesn't have an exception for the MSDTC process, the two machines cannot find each other by their NetBIOS names, or the support for network transactions is not enabled for one of the two transaction managers. (Exception from HRESULT: 0x8004D02B)
I am looking for another way of debugging the problem. is there a way to have some logs of some sort for example

DTC issues using Oracle and .NET 4 - RM_COMMIT_DELIVERY_FAILED_DUE_TO_CONNECTION_DOWN

First a little intro to our setup:
WCF based app with EF 4 context injected using Unity (no singleton)
Oracle running on a seperate physical machine
NServiceBus handling messages that access Oracle through the same context as above
The problem we are experiencing, only on our UAT environment, is that we cannot send multiple messages without receiving distributed transaction locks on DTC. The DTC trace tells us this:
1. TRANSACTION_COMMITTED
2. RM_ISSUED_COMMIT
3. RM_ISSUED_COMMIT
4. RM_ACKNOWLEDGED_COMMIT
5. RM_COMMIT_DELIVERY_FAILED_DUE_TO_CONNECTION_DOWN
Any bright ideas?
It seems the problem lies within our client app WCF configuration.
Deep down in our framework we are setting TransactionFlow = true which tries to setup a transaction scope starting from the client. If we run our request and fire of a NServiceBus message we loose the link with our client and cannot commit the transaction.
So TransactionFlow = false in app.config saved us.

Categories