My program execution hangs on .Open() method longer than specified 2 seconds timeout when connecting to SQL-Server.
I managed to reproduce the problem when machine that database is on is offline or restartig. When machine that database is on get up the execution of program un-hangs.
How can I force to throw exception or whatever when 2 second timeout exceed?
I tried OleDB and SqlConnection provider, no difference.
I can ping the machine before connecting to it but there is still case that ping will be successful (let's say 1 second before machine shutdown) and then Open connection hangs...
Example code provided below.
public static IDbConnection GetConnection(IDbConnection connection)
{
connection.ConnectionString = "connectionString; Connection Timeout=2;";
connection.Open();
return connection;
}
Connection Timeout property is just the time for the connection to be created, everything after that (and there is a lot to do after connection is established) does not count and it may take indefinitely (unless there is another timeout I'm not aware of).
What you can do is to execute your own code in a separate thread with a watchdog to limit total time execution to two seconds. Using tasks it is pretty easy:
const int HardConnectionTimeoutInMilliseconds = 2000;
if (!Task.Run(() => connection.Open()).Wait(HardConnectionTimeoutInMilliseconds))
return null; // Timeout!
Just for completeness this is old-style code for this:
Thread worker = new Thread(delegate()
{
connection.Open();
});
worker.Start();
if (!worker.Join(TimeSpan.FromSeconds(2)))
return null;
Be careful with so short timeout: for a TCP connection two seconds are always a too short time and if you're using Windows authentication with AD then it may takes longer than you expect.
In my opinion you have to live with this lag (15 to 30 seconds are a safe and reasonable time for TCP connection with integrated security). You may also want to wait more and retry (because errors may be temporary, see Know when to retry or fail when calling SQL Server from C#?), note that situation you're describing (server is going down) is pretty unusual then IMO it shouldn't affect normal operations. If it's an issue for your UI then you should make your program parallel (to keep UI responsive).
Related
How can I create a server side script to ping clients on mysql and windows form to see if they are still logged in?
For example ping a client and they have to do something in return to verify that they are still online.
You can't directly ping client from server, but there are some options.
About timeouts
I think in any case you have to use server side timeout (wait_timeout), this timeout can be set server wide or upon connecting from the client, ex:
SET SESSION wait_timeout = 60
This is required because client can suddenly disappear without even closing tcp connection and explicit timeout will help mysql to close connection and free resources after wait_timeout seconds of client inactivity. According to mysql manual default timeout is rather large: 28800 seconds.
There is a drawback. If during normal operation your client may be inactive for more than wait_timeout seconds then either the client should know how to deal with closed connection (reconnect when database tells it has gone away) or it should send "ping" queries (like select 1) at least every wait_timeout - 1 seconds.
Using get_lock() function
Since mysql 5.7(and also in mariadb since 10.0.2) you can use multiple user-level locks.
A lock obtained with GET_LOCK() is released explicitly by executing RELEASE_LOCK() or implicitly when your session terminates (either normally or abnormally). Locks obtained with GET_LOCK() are not released when transactions commit or roll back.
So the idea is to issue get_lock query upon client connection, ex:
SELECT GET_LOCK('logged_in_{CLIENT_ID}', timeout)
You can set timeout to 0 and immediately tell that client cannot login, or you can wait (blocking) at most wait_timeout seconds to be sure that there is really another client holding the lock.
This lock will be released automatically by server when client disconnects or after wait_timeout of inactivity.
If lock is free get_lock() will return 1 otherwise (after waiting timeout seconds) will return 0
Using process id
If you don't want to use locks, PID of process can be used.
When client connects instead of writing islogged = 'YES' you can use current CONNECTION_ID() as a value.
Before logging in you can check that there is no active process for current client like this
SELECT islogged FROM logged
INNER JOIN information_schema.processlist ON
processlist.id = logged.islogged
WHERE
client_id = ...
And if above query returns nothing you then may upsert new pid into logged table
REPLACE INTO logged SET islogged = CONNECTION_ID(), client_id = ...
I would prefer to use get_lock() because it seems easier, does not suffer from concurrency issues and allows to implement waiting.
Don't forget that timeouts are essential and you have to deal with reconnection or send regular pings to avoid unexpected "server has gone away" issues in client.
Basic Intro:
Connection string: Host=IP;Port=somePort;Username=someUser;Password=somePass;Database=someDb;Maximum Pool Size=100
My web application has several dozen endpoints available via WS and HTTP. Every one of these endpoints opens a new NPGSQL connection (all using the same connection string as above), processes data, then closes via the using statement.
Issue:
When the application restarts for an update, there is typically 2-3,000 users all reconnecting. This typically leads to errors regarding the connection pool being full and new connections being rejected due to too many clients already. However, once it can finally come online it typically only uses between 5-10 connections at any given time.
Question:
Is the logic below the proper way to use connection pooling? With every endpoint creating a new NPGSQL connection using the same connection string specifying a connection pool of 100?
It seems that the connection pool often shoots right up to 100, but ~80/100 of those connections are shown as idle in a DB viewer with new connection requests being denied due to pool overflow.
Better option?
I could also try and force a more "graceful" startup by slowly allowing new users to re-connect, but I'm not sure if the logic for creating a new connection with every endpoint is correct.
// DB Connection String - Used for all NPGSQL connections
const string connectionStr "Host=IP;Port=somePort;Username=someUser;Password=somePass;Database=someDb;Maximum Pool Size=100";
// Endpoint 1 available via Websocket
public async Task someRequest(someClass someArg)
{
/* Create a new SQL connection for this user's request using the same global connections string */
using var conn = new NpgsqlConnection(connectionStr);
conn.Open();
/* Call functions and pass this SQL connection for any queries to process this user request */
somefunction(conn, someArg);
anotherFunction(conn, someArg);
/* Request processing is done */
/* conn is closed automatically by the "using" statement above */
}
// Endpoint 2 available via Websocket
public async Task someOtherRequest(someClass someArg)
{
/* Create a new SQL connection for this user's request using the same global connections string */
using var conn = new NpgsqlConnection(connectionStr);
conn.Open();
/* Call functions and pass this SQL connection for any queries to process this user request */
somefunction(conn, someArg);
anotherFunction(conn, someArg);
/* Request processing is done */
/* conn is closed automatically by the "using" statement above */
}
// endpoint3();
// endpoint4();
// endpoint5();
// endpoint6();
// etc.
EDIT:
I've made the change suggested, by closing connections and sending them back to the pool during processing. However, the issue still persists on startup.
Application startup - 100 connections claimed for pooling. Almost all of them are idle. Application receives connection pool exhaustion errors, little to no transactions are even processed.
Transactions suddenly start churning, not sure why? Is this after some sort of timeout perhaps? I know there was some sort of 300 second timeout default in documentation somewhere... this might match up here.
Transactions lock up again, pool exhaustion errors resume.
Transactions spike and resume, user requests start coming through again.
Application levels out as normal.
EDIT 2:
This startup issue seems to consistently be taking 5 minutes from startup to clear a deadlock of idle transactions and start running all of the queries.
I know 5 minutes is the default value for idle_in_transaction_session_timeout. However, I tried running SET SESSION idle_in_transaction_session_timeout = '30s'; and 10s during the startup this time and it didn't seem to impact it at all.
I'm not sure why those 100 pooled connections would be stuck in idle like that on startup, taking 5 minutes to clear and allow queries to run if that's the case...
I had forgotten to update this post with some of the latest information. There was a few other internal optimizations I had made in the code.
One of the major ones, was simply changing conn.Open(); to await conn.OpenAsync(); and conn.Close(); to conn.CloseAsync();.
Everything else I had was properly async, but there was still IO blocking for all of the new connections in NPGSQL, causing worse performance with large bursts.
A very obvious change, but I didn't even think to look for an async method for the connections opening and closing.
A connection is released to the pool once you close it in your code. From what you wrote, you are keeping it open for the entire time of a request, so basically 1 user = 1 connection and pooling is just used as a waiting room (timeout setting, 15 seconds by default). Open/Close the connection each time you need to access the DB, so the connection is returned to the pool and can be used by another user when time is spent in .net code.
Example, in pseudo code:
Enter function
Do some computations in .net, like input validation
Open connection (grab it from the pool)
Fetch info#1
Close connection (return it to the pool)
Do some computations in .net, like ordering the result, computing an age from a date etc
Open connection (grab it from the pool)
Fetch info #2
Close connection (return it to the pool)
Do some computations in .net
Return
I'm using Azure Function V1 with StackExchange.Redis 1.2.6. Function receiving 1000s of messages per minutes, For every message, For every device, I'm checking Redis. I noticed When we have more messages at that time we are getting below an error.
Exception while executing function: TSFEventRoutingFunction No connection is available to service this operation: HGET GEO_DYNAMIC_hash; It was not possible to connect to the redis server(s); ConnectTimeout; IOCP: (Busy=1,Free=999,Min=24,Max=1000), WORKER: (Busy=47,Free=32720,Min=24,Max=32767), Local-CPU: n/a It was not possible to connect to the redis server(s); ConnectTimeout
CacheService as recommended by MS
public class CacheService : ICacheService
{
private readonly IDatabase cache;
private static readonly string connectionString = ConfigurationManager.AppSettings["RedisConnection"];
public CacheService()
{
this.cache = Connection.GetDatabase();
}
private static Lazy<ConnectionMultiplexer> lazyConnection = new Lazy<ConnectionMultiplexer>(() =>
{
return ConnectionMultiplexer.Connect(connectionString);
});
public static ConnectionMultiplexer Connection
{
get
{
return lazyConnection.Value;
}
}
public async Task<string> GetAsync(string hashKey, string ruleKey)
{
return await this.cache.HashGetAsync(hashKey, ruleKey);
}
}
I'm injecting ICacheService in Azure function and calling GetAsync Method on every request.
Using Azure Redis Instance C3
Currently, you can see I have a single connection, Creating multiple connections will help to solve this issue? or Any other suggestion to solve/understand this issue.
There are many different causes of the error you are getting. Here are some I can think of off the top of my head (not in any particular order):
Your connectTimeout is too small. I often see customers set a small connect timeout often because they think it will ensure that the connection is established within that time span. The problem with this approach is that when something goes wrong (high client CPU, high server CPU, etc), then the connection attempt will fail. This often makes a bad situation worse - instead of helping, it aggravates the problem by forcing the system to restart the process of trying to reconnect, often resulting in a connect -> fail -> retry loop. I generally recommend that you leave your connectionTimeout at 15 seconds or higher. It is better to let your connection attempt succeed after 15 or 20 seconds than it is to have it fail after 5 seconds repeatedly, resulting in an outage lasting several minutes until the system finally recovers.
A server-side failover occurs. A connection is severed by the server as a result of some type of failover from master to replica. This can happen if the server-side software is updated at the Redis layer, the OS layer or the hosting layer.
A networking infrastructure failure of some type (hardware sitting between the client and the server sees some type of issue).
You change the access password for your Redis instance. Changing the password will reset connections to all clients to force them to re-authenticate.
Thread Pool Settings need to be adjusted. If your thread pool settings are not adjusted correctly for your workload, then you can run into delays in spinning up new threads as explained here.
I have written a bunch of best practices for Redis that will help you avoid other problems as well.
We solved this issue by upgrading StackExchange.Redis to 2.1.30.
I recently noticed that when our application does an SQL query to the Oracle database, it always takes at least 200 ms to execute. It doesn't matter how simple or complex the query is, the minimum time is about 200 ms. We're using the Oracle Managed Data Access driver for Oracle 11g.
I then created a simple console application to test the connection. I noticed that if I create the connection like in the example below, then every cmd.ExecuteReader method takes the extra 200 ms (opening the connection)?
using (OracleConnection con = new OracleConnection(connStr))
{
con.Open();
OracleCommand cmd = con.CreateCommand();
...
}
The connection state is always Closed when creating the connection like this (shouldn't it be open if the connections are pooled?).
If I open the connection at the start of the program and then pass the opened connection to the method, the cmd.ExecuteReader takes about 0-5 ms to return. I've tried to add Pooling=true to the connection string but it doesn't seem to do anything (it should be the default anyway).
Does this mean that the connection pooling is not working as it should? Or could there be any other reason why the cmd.ExecuteReader takes the extra 200 ms to execute?
The problem is almost the same as in this issue, except that we're using Oracle Connection pooling is slower than keeping one connection open
Is your database remote and the delay caused by network? In this case connection pooling works but the problem is that there is always a TCP communication roundtrip (and not even TNS packet). Unfortunately this happens with every Open call.
Managed data access implementation communicates in different way so the overhead takes place only at the very first Open call, then the Open method is free.
After a lot of testing and research I finally figured out where the extra 200ms comes from: my virtual computer's network adapter. I'm using VMWare Player and the connection was configured to "NAT" mode. When I changed the connection to "Bridged" mode the latency was removed.
I've been getting this error recently, after several reloads of the same page:
System.InvalidOperationException: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached
So I am thinking there must be some queries or calls in the app I used incorrectly that causes them not to release the connection. Is there any tools out there that allows me to somehow peek into the pool to see who is hanging on to what?
There's a timeout property on your connection object that you can change. This will change the time it waits to get a connection, there's also a command timeout which controls how long it waits until the command times out once it is running (but the first one sounds like what you need) see here (anything that inherits from DBConnection should have this if you arn't using sql server).
Have a look here too, might help :)