I'm using Azure Function V1 with StackExchange.Redis 1.2.6. Function receiving 1000s of messages per minutes, For every message, For every device, I'm checking Redis. I noticed When we have more messages at that time we are getting below an error.
Exception while executing function: TSFEventRoutingFunction No connection is available to service this operation: HGET GEO_DYNAMIC_hash; It was not possible to connect to the redis server(s); ConnectTimeout; IOCP: (Busy=1,Free=999,Min=24,Max=1000), WORKER: (Busy=47,Free=32720,Min=24,Max=32767), Local-CPU: n/a It was not possible to connect to the redis server(s); ConnectTimeout
CacheService as recommended by MS
public class CacheService : ICacheService
{
private readonly IDatabase cache;
private static readonly string connectionString = ConfigurationManager.AppSettings["RedisConnection"];
public CacheService()
{
this.cache = Connection.GetDatabase();
}
private static Lazy<ConnectionMultiplexer> lazyConnection = new Lazy<ConnectionMultiplexer>(() =>
{
return ConnectionMultiplexer.Connect(connectionString);
});
public static ConnectionMultiplexer Connection
{
get
{
return lazyConnection.Value;
}
}
public async Task<string> GetAsync(string hashKey, string ruleKey)
{
return await this.cache.HashGetAsync(hashKey, ruleKey);
}
}
I'm injecting ICacheService in Azure function and calling GetAsync Method on every request.
Using Azure Redis Instance C3
Currently, you can see I have a single connection, Creating multiple connections will help to solve this issue? or Any other suggestion to solve/understand this issue.
There are many different causes of the error you are getting. Here are some I can think of off the top of my head (not in any particular order):
Your connectTimeout is too small. I often see customers set a small connect timeout often because they think it will ensure that the connection is established within that time span. The problem with this approach is that when something goes wrong (high client CPU, high server CPU, etc), then the connection attempt will fail. This often makes a bad situation worse - instead of helping, it aggravates the problem by forcing the system to restart the process of trying to reconnect, often resulting in a connect -> fail -> retry loop. I generally recommend that you leave your connectionTimeout at 15 seconds or higher. It is better to let your connection attempt succeed after 15 or 20 seconds than it is to have it fail after 5 seconds repeatedly, resulting in an outage lasting several minutes until the system finally recovers.
A server-side failover occurs. A connection is severed by the server as a result of some type of failover from master to replica. This can happen if the server-side software is updated at the Redis layer, the OS layer or the hosting layer.
A networking infrastructure failure of some type (hardware sitting between the client and the server sees some type of issue).
You change the access password for your Redis instance. Changing the password will reset connections to all clients to force them to re-authenticate.
Thread Pool Settings need to be adjusted. If your thread pool settings are not adjusted correctly for your workload, then you can run into delays in spinning up new threads as explained here.
I have written a bunch of best practices for Redis that will help you avoid other problems as well.
We solved this issue by upgrading StackExchange.Redis to 2.1.30.
Related
I'm using .NET Framework 4.8 for a console application that manages an ETL process and a .NET standard 2.0 library for the HTTP requests that use the HttpClient. This application is expected to handle millions of records and is long running.
These requests are made in parallel with a maximum concurrency limit of 20.
At application launch, I increase the number of connections .NET can make to a single host in a connection pool via the ServicePointManager. This is a frequent cause for connection pool starvation, as in .NET Framework it defaults to 2.
public static async Task Main(string[] args)
{
ServicePointManager.DefaultConnectionLimit = 50;
...
}
This is greater than the number of max concurrent requests allowed to ensure my requests are not being queued up on connections that are already being used.
I then loop through the records and post them to a TPL Dataflow block with the concurrency limit of 20.
This block makes the request on my API client which uses a singleton HttpClient for all requests..
public class MyApiClient
{
private static HttpClient _httpClient { get; set; }
public MyApiClient()
{
httpClient = new HttpClient();
}
public async Task<ReturnedObject> UploadNewDocumentAsync(DocParams docParams, DocData docData)
{
MultipartFormDataContent content = ConstructMultipartFormDataContent(docParams);
HttpContent httpContent = docData.ConvertToHttpContent();
content.Add(httpContent);
using (HttpResponseMessage response = await _httpClient.PostAsync("document/upload", content))
{
return await HttpResponseReader.ReadResponse<ReturnedObject>(response).ConfigureAwait(false);
}
}
}
I'm using sysinternals TCPView.exe to view all connections made between the local host and remote host, as well as what the status of those connections are and whether data is actively being sent or not.
When starting the application I see new connections established for each request made, until around 50 connections are established. I also see activity for around 20 at any given time. This meets expectations.
After around 24 hours of activity, TCPView shows only 5 connections concurrently sending and receiving data. All 50 connections still exist and are all in the Established state, but the majority sit idle. I don't have a way of logging when connections stop being actively used. So I don't know whether all of a sudden it drops from using 20 connections to only 5, or whether it gradually decreases.
My log file records elapsed time for every request made and I also see a degradation in performance at this time. Requests take longer and longer to complete, and I see an increase in TaskCanceled exceptions as the HttpClient reaches its timeout value of 100 seconds.
I've also confirmed with the 3rd party API vendor that they are not receiving a large number of incoming requests timing out.
This suggests to me that the application is still trying to make 20 requests at a time, but they are being queued up on a smaller number of TCP connections. All the symptoms point to classic connection pool starvation.
While the application is running, I can output some information from the ServicePointManager to the console.
ServicePoint sp = ServicePointManager.FindServicePoint(myApiService.BasePath, WebRequest.DefaultWebProxy);
Console.WriteLine(
$"CurrentConnections: {sp.CurrentConnections} " +
$"ConnectionLimit: {sp.ConnectionLimit}");
Console output:
CurrentConnections: 50 ConnectionLimit: 50
This validates what I see in TCPView.exe, which is that all 50 connections are still allowed and are established.
The symptoms show connection pool starvation, but TCPView.exe and ServicePointManager show there are plenty of available established connections. .NET is just not using them all. This behaviour only shows up after several hours of runtime. The issue is repeatable. If I close and relaunch the application, it begins by rapidly opening all 50 TCP connections, and I see data being transferred on up to 20 at a time. When I check 24 hours later, the symptoms of connection pool starvation have shown up again.
What could cause the behavior, and is there anything further I could do to validate my assumptions?
Basic Intro:
Connection string: Host=IP;Port=somePort;Username=someUser;Password=somePass;Database=someDb;Maximum Pool Size=100
My web application has several dozen endpoints available via WS and HTTP. Every one of these endpoints opens a new NPGSQL connection (all using the same connection string as above), processes data, then closes via the using statement.
Issue:
When the application restarts for an update, there is typically 2-3,000 users all reconnecting. This typically leads to errors regarding the connection pool being full and new connections being rejected due to too many clients already. However, once it can finally come online it typically only uses between 5-10 connections at any given time.
Question:
Is the logic below the proper way to use connection pooling? With every endpoint creating a new NPGSQL connection using the same connection string specifying a connection pool of 100?
It seems that the connection pool often shoots right up to 100, but ~80/100 of those connections are shown as idle in a DB viewer with new connection requests being denied due to pool overflow.
Better option?
I could also try and force a more "graceful" startup by slowly allowing new users to re-connect, but I'm not sure if the logic for creating a new connection with every endpoint is correct.
// DB Connection String - Used for all NPGSQL connections
const string connectionStr "Host=IP;Port=somePort;Username=someUser;Password=somePass;Database=someDb;Maximum Pool Size=100";
// Endpoint 1 available via Websocket
public async Task someRequest(someClass someArg)
{
/* Create a new SQL connection for this user's request using the same global connections string */
using var conn = new NpgsqlConnection(connectionStr);
conn.Open();
/* Call functions and pass this SQL connection for any queries to process this user request */
somefunction(conn, someArg);
anotherFunction(conn, someArg);
/* Request processing is done */
/* conn is closed automatically by the "using" statement above */
}
// Endpoint 2 available via Websocket
public async Task someOtherRequest(someClass someArg)
{
/* Create a new SQL connection for this user's request using the same global connections string */
using var conn = new NpgsqlConnection(connectionStr);
conn.Open();
/* Call functions and pass this SQL connection for any queries to process this user request */
somefunction(conn, someArg);
anotherFunction(conn, someArg);
/* Request processing is done */
/* conn is closed automatically by the "using" statement above */
}
// endpoint3();
// endpoint4();
// endpoint5();
// endpoint6();
// etc.
EDIT:
I've made the change suggested, by closing connections and sending them back to the pool during processing. However, the issue still persists on startup.
Application startup - 100 connections claimed for pooling. Almost all of them are idle. Application receives connection pool exhaustion errors, little to no transactions are even processed.
Transactions suddenly start churning, not sure why? Is this after some sort of timeout perhaps? I know there was some sort of 300 second timeout default in documentation somewhere... this might match up here.
Transactions lock up again, pool exhaustion errors resume.
Transactions spike and resume, user requests start coming through again.
Application levels out as normal.
EDIT 2:
This startup issue seems to consistently be taking 5 minutes from startup to clear a deadlock of idle transactions and start running all of the queries.
I know 5 minutes is the default value for idle_in_transaction_session_timeout. However, I tried running SET SESSION idle_in_transaction_session_timeout = '30s'; and 10s during the startup this time and it didn't seem to impact it at all.
I'm not sure why those 100 pooled connections would be stuck in idle like that on startup, taking 5 minutes to clear and allow queries to run if that's the case...
I had forgotten to update this post with some of the latest information. There was a few other internal optimizations I had made in the code.
One of the major ones, was simply changing conn.Open(); to await conn.OpenAsync(); and conn.Close(); to conn.CloseAsync();.
Everything else I had was properly async, but there was still IO blocking for all of the new connections in NPGSQL, causing worse performance with large bursts.
A very obvious change, but I didn't even think to look for an async method for the connections opening and closing.
A connection is released to the pool once you close it in your code. From what you wrote, you are keeping it open for the entire time of a request, so basically 1 user = 1 connection and pooling is just used as a waiting room (timeout setting, 15 seconds by default). Open/Close the connection each time you need to access the DB, so the connection is returned to the pool and can be used by another user when time is spent in .net code.
Example, in pseudo code:
Enter function
Do some computations in .net, like input validation
Open connection (grab it from the pool)
Fetch info#1
Close connection (return it to the pool)
Do some computations in .net, like ordering the result, computing an age from a date etc
Open connection (grab it from the pool)
Fetch info #2
Close connection (return it to the pool)
Do some computations in .net
Return
I have a .net core application in which I am using c# and MongoDB. Within the application, I am using MongoDB driver (Version 2.7) for any database related operation and I have a MongoDB database (Version 4.0.9). I am facing one strange issue and not able to find out the root cause for it. The very first request to the database is taking significantly more time than the subsequent requests. As an example, if the first request is taking 1 second and if we make immediate more requests it is taking ~200-250 milliseconds
Does anyone know the solution to the above situation?
this is not an error. it is the default behavior of the c# driver. the driver only establishes the connection to the database server when the very first operation is initiated and will take a few hundred milliseconds to establish connection.
subsequent operations do not need to establish new connections because of the driver's connection pooling mechanisms. more connections will only be established only if they are really needed. if the app is not multi-threaded, the driver will usually open about 2 connections for the entirety of the app from what i have seen. if you inspect your mongodb log file, it will be apparent.
my suggestion is to just ignore the time it takes to initialize the connection if you're doing any kind of testing/ benchmarks.
UPDATE:
if your database is hosted across a network, something like a firewall may be interfering with idle connections. if that's the case you could try doing the following so that idle connections get recycled/renewed every minute.
MongoDefaults.MaxConnectionIdleTime = TimeSpan.FromMinutes(1)
if all else fails to work, the only remaining option i can think of is to fire off a keep-alive task like the following:
public void InitKeepAlive()
{
Task.Run(async () =>
{
while (true)
{
await client.GetCollection<Document>("Documents")
.AsQueryable()
.Select(d => d.Id)
.FirstOrDefaultAsync();
await Task.Delay(TimeSpan.FromMinutes(1));
}
});
}
My program execution hangs on .Open() method longer than specified 2 seconds timeout when connecting to SQL-Server.
I managed to reproduce the problem when machine that database is on is offline or restartig. When machine that database is on get up the execution of program un-hangs.
How can I force to throw exception or whatever when 2 second timeout exceed?
I tried OleDB and SqlConnection provider, no difference.
I can ping the machine before connecting to it but there is still case that ping will be successful (let's say 1 second before machine shutdown) and then Open connection hangs...
Example code provided below.
public static IDbConnection GetConnection(IDbConnection connection)
{
connection.ConnectionString = "connectionString; Connection Timeout=2;";
connection.Open();
return connection;
}
Connection Timeout property is just the time for the connection to be created, everything after that (and there is a lot to do after connection is established) does not count and it may take indefinitely (unless there is another timeout I'm not aware of).
What you can do is to execute your own code in a separate thread with a watchdog to limit total time execution to two seconds. Using tasks it is pretty easy:
const int HardConnectionTimeoutInMilliseconds = 2000;
if (!Task.Run(() => connection.Open()).Wait(HardConnectionTimeoutInMilliseconds))
return null; // Timeout!
Just for completeness this is old-style code for this:
Thread worker = new Thread(delegate()
{
connection.Open();
});
worker.Start();
if (!worker.Join(TimeSpan.FromSeconds(2)))
return null;
Be careful with so short timeout: for a TCP connection two seconds are always a too short time and if you're using Windows authentication with AD then it may takes longer than you expect.
In my opinion you have to live with this lag (15 to 30 seconds are a safe and reasonable time for TCP connection with integrated security). You may also want to wait more and retry (because errors may be temporary, see Know when to retry or fail when calling SQL Server from C#?), note that situation you're describing (server is going down) is pretty unusual then IMO it shouldn't affect normal operations. If it's an issue for your UI then you should make your program parallel (to keep UI responsive).
Recently, I've been checking out RabbitMQ over C# as a way to implement pub/sub. I'm more used to working with NServiceBus. NServiceBus handles transactions by enlisting MSMQ in a TransactionScope. Other transaction aware operations can also enlist in the same TransactionScope (like MSSQL) so everything is truly atomic. Underneath, NSB brings in MSDTC to coordinate.
I see that in the C# client API for RabbitMQ there is a IModel.TxSelect() and IModel.TxCommit(). This works well to not send messages to the exchange before the commit. This covers the use case where there are multiple messages sent to the exchange that need to be atomic. However, is there a good way to synchronize a database call (say to MSSQL) with the RabbitMQ transaction?
You can write a RabbitMQ Resource Manager to be used by MSDTC by implementing the IEnlistmentNotification interface. The implementation provides two phase commit notification callbacks for the transaction manager upon enlisting for participation. Please note that MSDTC comes with a heavy price and will degrade your overall performance drastically.
Example of RabbitMQ resource manager:
sealed class RabbitMqResourceManager : IEnlistmentNotification
{
private readonly IModel _channel;
public RabbitMqResourceManager(IModel channel, Transaction transaction)
{
_channel = channel;
_channel.TxSelect();
transaction.EnlistVolatile(this, EnlistmentOptions.None);
}
public RabbitMqResourceManager(IModel channel)
{
_channel = channel;
_channel.TxSelect();
if (Transaction.Current != null)
Transaction.Current.EnlistVolatile(this, EnlistmentOptions.None);
}
public void Commit(Enlistment enlistment)
{
_channel.TxCommit();
enlistment.Done();
}
public void InDoubt(Enlistment enlistment)
{
Rollback(enlistment);
}
public void Prepare(PreparingEnlistment preparingEnlistment)
{
preparingEnlistment.Prepared();
}
public void Rollback(Enlistment enlistment)
{
_channel.TxRollback();
enlistment.Done();
}
}
Example using resource manager
using(TransactionScope trx= new TransactionScope())
{
var basicProperties = _channel.CreateBasicProperties();
basicProperties.DeliveryMode = 2;
new RabbitMqResourceManager(_channel, trx);
_channel.BasicPublish(someExchange, someQueueName, basicProperties, someData);
trx.Complete();
}
As far as I'm aware there is no way of coordinating the TxSelect/TxCommit with the TransactionScope.
Currently the approach that I'm taking is using durable queues with persistent messages to ensure they survive RabbitMQ restarts. Then when consuming from the queues I read a message off do some processing and then insert a record into the database, once all this is done I ACK(nowledge) the message and it is removed from the queue. The potential problem with this approach is that the message could end up being processed twice (if for example the message is committed to the DB but say the connection to RabbitMQ is disconnected before the message can be ack'd), but for the system that we're building we're concerned about throughput. (I believe this is called the "at-least-once" approach).
The RabbitMQ site does say that there is a significant performance hit using the TxSelect and TxCommit so I would recommend benchmarking both approaches.
However way you do it, you will need to ensure that your consumer can cope with the message potentially being processed twice.
If you haven't found it yet take a look at the .Net user guide for RabbitMQ here, specifically section 3.5
Lets say you've got a service bus implementation for your abstraction IServiceBus. We can pretend it's rabbitmq under the hood, but it certainly doesn't need to be.
When you call servicebus.Publish, you can check System.Transaction.Current to see if you're in a transaction. If you are and it's a transaction for a mssql server connection, instead of publishing to rabbit you can publish to a broker queue within sql server which will respect the commit/rollback with whatever database operation you're performing (you want to do some connection magic here to avoid the broker publish upgrading your txn to msdtc)
Now you need to create a service that needs to read the broker queue and do an actual publish to rabbit, this way, for very important things, you can gaurantee that your database operation completed previously and that the message gets published to rabbit at some point in the future (when the service relays it). its still possible for failures here if when committing the broker receive an exception occurs, but the window for problems is drastically reduced and worse case scenario you would end up publishing multiple times, you would never lose a message. This is very unlikely, the sql server going offline after receive but before commit would be an example of when you would end up at minimum double publishing (when the server comes on-line you'd publish again) You can build your service smart to mitigate some, but unless you use msdtc and all that comes with it (yikes) or build your own msdtc (yikes yikes) you are going to have potential failures, it's all about making the window small and unlikely to occur.