Getting spikes in gRPC calls every hour - c#

I have a website running with around 7 servers (C#). And there is a gRPC service (golang) running with 3 instances. Each web server connects to and makes calls to the gRPC service. There are around 8000 calls per minute to the service.
The call to this service is not that critical, so lately we reduced the deadline of the call to 20 milliseconds. Here we noticed something strange. There was a spike in the "deadline exceeded" errors every hour throughout the day. And it happens exactly at 0th minute i.e. 2pm 3pm 4pm etc.
Why does this happen?
I came across this link saying gRPC resets the connection every hour, but nothing more than that.
So my question is does gRPC internally refresh the connection every hour. If yes is there anyway to tweak this behavior. If no then can someone give some direction as to how I can debug why this is happening.

No, grpc-go does not refresh connections. The only time it initiates a disconnect is if you configure "max idle" (ref) and the connection has been idle for longer than that time limit. By default this is disabled, so it's unlikely to be the culprit in this case.

Related

Web api calls are slow in azure

I'm developing a web api and I hosted it on azure, I have a call that takes about 2.5 seconds in my local machine but takes a lot longer when the app is hosted in azure as you can see in this figure:
it's taking 12.8 seconds which is not expected, why is this happening, and what is the part highlighted in red? why does it take about 10 seconds to start with the first operation in the code? I have "AlwaysOn " on ON so this is not my api going to sleep, also, sometimes the call takes less time (4-6 seconds) which an inconsistency, please enlighten me.
If, CPU usage is not high, one reason could be SNAT port exhaustion / pending, if you have too many open TCP connections (including SQL Server's) then new connection will wait.
You can check that from your app service "Diagnose and solve problems" -> "Availability and Performance" -> "SNAT Port Exhaustion".
If this is the case, this is a good place to start: https://learn.microsoft.com/en-us/aspnet/web-api/overview/advanced/calling-a-web-api-from-a-net-client
Have you tried to increase the tier of your app service plan? This will help you to understand if it's an infrastructure or code problem

How to troubleshoot MaxConcurrentSessions exceeded in IIS hosted WCF Service

I'm way out of my comfort zone so bear with me on providing the relevant information. We have just moved a IIS hosted WCF service to a new server and clients calling this service started experiencing timeouts. It does ok for about 10 minutes after recycling the app pool and then everything begins timing out. We enabled WCF tracing where I can see that its saying the MaxConcurrentSessions has been exceeded. The documentation says that value defaults to 2 x [# of processors] so it should be 200 for us.
The server is behind a load balancer, but is currently the only server. We notice the connections hang out at around 6 per second in Performance Monitor but will climb up to around 30 when the timeouts happen and continue climbing up from there.
The clients are connecting using a wsHttpBinding TransportWithMessageCredential security. The service validates the credentials provided in the message using the asp.net membership provider in a custom UserNamePasswordValidator configured for use on the server binding behavior. The clients do not enable reliableSession on their bindings. The service uses the default SessionMode and InstanceContextMode which I believe are Allowed and PerSession respectively? We do not call Close on the service proxies because in past investigation, I've found this only sets a flag on the option preventing it from being re-used and ours always go out of scope anyway...but now doing testing to see if this does close the connection.
If I'm interpreting the WCF trace log correctly (and I don't understand the majority of what I'm reading there) it appears we are processing around 30-40 messages per minute and that each request is completed in less than 300ms (usually much less, on rare occasions nearly 1s.) I determined the number of messages by counting the Processing message n messages over a few 1 min spans. So if we're getting 40 per minute and it takes 100s for those connections/sessions to timeout and close, we would still only have about 68 open at once before the first ones begin to time out. Not close to the 200 limit. Does the connection for a single client request get more than one session?
The strange thing is we didn't have any timeouts before and copied the service and web.config straight over to the new server. I believe the server and IIS versions were upgraded (server 2016, IIS 10.) Can you please help me identify and provide the relevant information to track down the problem causing these timeouts?
Edit:
From my reading, everything seems to indicate that the client must call Close otherwise the server will leave the connection open until it times out. However, in our test, we see one connection created in perf. mon. but it remains open after Close has been called anyway. So I can't determine if the need to call close is a rumor or if we are misinterpretting our monitoring. The real test would be to call Close everywhere and see if it eliminates our timeouts.
After increasing our MaxConcurrentSessions to 400, in performance monitor, we saw the number of concurrent sessions and instances steadily rise by about 1 per second up to about 225 where it finally leveled off and it's hovering around there. So it seems like sessions are not being closed.
Well we figured it out. There was nothing that just popped up and told us what the problem was and it took a lot of brain storming, but here's what we did:
Enabled WCF tracing. Went through the traces and was able to understand enough to basically see that the traffic didn't look out of the ordinary. All of the events seemed to be for the expected amount and types of service calls. Viewing in svctraceviewer, It didn't seem to be a DOS attack or anything like that. We just used the default configuration from that link, but it looks like it can be very customized to provide the specific information you're after if you know what that is.
What really helped in this case was finding the WCF Performance Counters. Initially we were using ASP.NET performance counters to look at sessions open which was not the right metric. This codeproject guide helped us enable the WCF performance counters to give us an insight into the number of sessions and the limit in real time.
It also helped to brush up on how WCF sessions and instances are related as well as creation of a security context:
https://www.codeproject.com/Articles/188749/WCF-Sessions-Brief-Introduction
http://webservices20.blogspot.com/2009/01/wcf-performance-gearing-up-your-service.html
https://learn.microsoft.com/en-us/previous-versions/dotnet/netframework-4.0/hh273122(v=vs.100)
We were able to see the percentage of the max WCF sessions being used, and observed it climbing higher and higher towards the default limit of 200 (100 per processor) but eventually level off between 150 and 200. This leveling off, together with far more sessions existing at a given time than the average number of requests per minute seen in our WCF tracing, indicated that sessions were closing but seemed to be remaining open until they timed out rather than closing as soon as the server completed the request.
Somewhere on Stack Overflow, that I've been unable to find, I once asked about the purpose of the [ClientBase<TChannel>.Close][4] method (a.k.a. the close method of a WCF service proxy) and, somewhat incorrectly, came to the conclusion that all it did is set a flag on the proxy object marking it closed so that it couldn't be used again. The documentation's description of the method seems in line with that:
Causes the ClientBase<TChannel> object to transition from its current
state into the closed state.
Well at the point that I would call Close, my references always just go out of scope anyway allowing garbage collection to clean it up so that seemed pointless. But I think a key factor was that that was regarding basicHttpBindings which are stateless. In this case, we are using wsHttpBindings which are stateful which means the server leaves keeps the session and leaves the connection open after it completes the request so that subsequent calls from the client can be made on the same connection. So, though I couldn't find any documentation or track down in the source code where it happens, it seems WCF clients must call Close on their service proxy after they make their last request in order to tell the server it can close the connection and free up that session slot. I didn't have the opportunity to look for a message sent to the server upon calling Close to do this, but we were able to observe, using the Performance Counter, the number of sessions dropping from 1 to 0 where before it would remain at 1 after our client called the service.
But we're saying a WCF client, who we may have no control over, is able to harm server performance and possibly create a denial of service if they aren't diligent in their coding and remembering to call Close and the server has no control over its own performance?? That sounds like a recipe for disaster. Well there are two things you can do on the server to mitigate this. First you can increase the max number of sessions. In our case we were hovering around 175 but occasionally under traffic spikes exceeding the 200. We bumped it up to 800 temporarily to ensure we wouldn't exceed the max. The trade-off is dedicating more server resources to holding those sessions that will probably never be used again until they time out. Luckily, the server also controls the timeout. The service can control the length these sessions are held open using the ReceiveTimeout and the InactivityTimeout. Both default to 10 minutes but the lesser of the two will be used. If you're thinking, "Receive timeout sounds wrong. That controls the amount of time the service can take to receive a large message", you're not alone. However, that's incorrect. On the server side:
ReceiveTimeout – used by the Service Framework Layer to initialize the session-idle timeout which controls how long a session can be idle before timing out.
And on the client-side it is not used. So we set our ReceiveTimeout to 30 seconds and the sessions dropped significantly. That may have actually been too low because some spots in code that do re-use the service proxy (making multiple calls in a loop for instance, or doing some data processing in between calls) are now getting an error when trying to call the service after the session has been closed. So you will have to find the right balance. But best practice, it seems, is to close your connections.
One gotcha to watch out for is using Dispose on your service proxy. I had always tried typing .dispo to see if intellisense would popup the Dispose method on my proxy and found that it didn't so assumed it didn't implement IDisposable and didn't need to be closed or disposed. It turns out it does implement IDisposable but it does it explicitly so you'd have to cast it as an IDisposable to call Dispose on it. But wait! Don't go putting your proxy in a using statement just yet. The implementation of Dispose sillily just calls Close on the proxy which will throw an exception if the proxy is in the faulted state (i.e. if a service call threw an exception). So you can't safely do something like this:
using(MyWcfClient proxy = new MyWcfClient())
{
try
{
proxy.Calculate();
}
catch(Exception)
{
}
}
because if Calculate throws an exception, the closing bracket of the using block will also throw an exception when it tries to dispose your proxy. Instead you just have to call Close after your last service method call. Evidently you can also call Abort in the catch, but I'm not sure if that actually communicates with the server to end the session.
MyWcfClient proxy = new MyWcfClient
try
{
proxy.Calculate();
proxy.Close();
}
catch(Exception)
{
proxy.Abort();
}
Addendum
We surmise the reason we started experiencing this when moving servers and were not experiencing it before is we were using Barracuda products before and are now using Oracle and perhaps the old load balancer or firewall was closing open connections for us.

Time-consuming communication with HttpClient, WebApi and WCF

I need help to understand where my time problem is. I have a winform/wpf application which communicate with a WCF service through a webapi 2 and a System.Net.Http.HttpClient.
Client => HttpClient => webapi => wcf service.
When I deploy this and run, it takes the first time very long time to get an answer back. But second time and more it is very fast.
If I don't run it for a while it sleeps again.
Why is it so slow in the beginning, what will I look at?
When first call WebApi will initzialize (IIS have to run Api, and by default ISS start api after first call). This take some time. And in IIS You have the default AppPool Idle Time-out (minutes) set to 20 minutes, so after 20 minutes app will go to sleep mode, and IIS have to wake up app.
WebApi why 1st call is slow?
Almost the same problem is with WCF
WCF why 1st call is slow?
So in Your app you have problem with slow 1st api call, and after this you have problem with slow 1st wcf call. You have doubled the slow.

How To Keep My Web Services Awake?

I have written two web services that I am running on GoDaddy. One is a Microsoft WCF web service and the other is a RESTful Web API service. They are both working, but they rarely get traffic. If I don't call the web services for some period of time they seem to go to sleep. Then when I load the pages that call the web services they take some 20 to 30 seconds to retrieve data from the services. After that if I continue to call them repeatedly they load in just a second or two. Is this normal or did I do something wrong in my configuration? Is there some way to keep them active?
Entirely normal. You can either increase the recycle time limit in IIS (but you will still get recycled eventually) or you can write a quick scheduled task like the following to run every 10 minutes or so:
powershell Invoke-WebRequest -Uri "http://example.com"
Although I would caution that you should forcefully restart the service sometime during low usage hours just to clear the process memory / resource utilization.

An unsecured or incorrectly secured fault was received from the other party

I have a Windows program that calling a WCF service. This service after a few times be strongly slow and eventually this error will encounter. Restart after once again to the service starts again.
sincerely.
You are probably not closing the connection to the WCF service.
WCF has a default of 10 connections and a timeout of one min.
What then happens is that the first 10 hits go OK. The 11th has to wait for an available connection, it will get that after one min when the 1st connections timesout.
The solution is therefore to make sure that you are closing the WCF connections.
This happened to me on a newly provisioned server where the time on the box was setup incorrectly. Once the server time was correct this error message went away.
See the following for more information.

Categories