I am running an ASP.NET service. The service starts returning - "Service Unavailable - 503" under high loads.
Previously it was able to cope with these loads, I still am investigating why that is happening now.
I see a high requests rejected rate (via the ASP.NET perf counter) ; however the requests queued rate (via the ASP.NET perf counter) varies from deployment to deployment from 1 to 150. For some deployments that show a high requests rejected rate, I can correlate that to the high requests queued rate. However, for some deployments the requests queued is low - 1-5 but the requests rejected rate is high.
Am I missing something here? Any pointers on how to investigate this issue further?
I'd take a peek with a profiler, to see if your getting load in other areas that you weren't before such as syncronous DB and network calls.
Look at newRelic (simple to use) and identify the bottlenecks, so simple code changes may help you get out of your immediate hole.
Moving forward look into making the code base more async (if it isn't already).
Related
I'm currently developing website in asp core 2.2. This site use external API. But I have one big problem and don't know how to solve this. This external API has limit 10 reguest per IP/s. If 11 user click button on my site and call API at the same time, the API can cut me off for a couple hours. The API owner tells clients to take care of not exceeding the limit. Can you have any idea how doing this?
ps. Of course, a million users are a joke, but I want the site to be publicly available :)
That 10 request/s is a hard limit and it seems like theres no way around it. So you have to solve it on your end.
There are couple options:
Calls that API directly using Javascript. This way each user will be able to do 10 request/s instead of 10 request/s for all users (recommended)
Queue the requests and only send out at most 10/s (highly not recommended, kills your thread pool and can block everyone from accessing your site when the speed of input coming is > output)
Drop the request on server side when you are reaching that 10/s limit and have the client retry at a later time. (wait time will be infinite when speed of input coming is > output)
And depending on the content returned by the API you might be able to cache it on server side to avoid having to request it from the 3rd party again.
In this scenario you would need to account for the possibility that you can't process requests in real time. You wouldn't want to have thousands of requests waiting on access to a resource that you don't control.
I second the answer about calling the API from the client, if that's an option.
Another option is to keep a counter of current requests, limit it to ten, and return a 503 error if a request comes in that exceeds that capacity. That's practical if you really don't expect to exceed ten concurrent requests often or ever but want to be sure that in the odd chance that it happens it doesn't shut down this feature of your site.
If you actually expect large volumes where you would exceed ten concurrent requests then you would need to queue the requests, but do it in a process separate from your web application. As mentioned, if you have tons of requests waiting for the same resource your application will become overloaded. You could enqueue the request with an entirely different process, and then the client would have to poll your application with occasional requests to see if there's a response.
The big flaw in this last scenario is that it means your users could end up waiting a long time because your application depends on a finite resource that you cannot scale. You can manage it in a way that keeps your application from failing, but not in a way that makes it respond quickly.
I'm sending about ~3K HTTP requests per second to some HTTP servers to use an API that my service relies on.
The requests are made asynchronously by calling BeginGetResponse().
99% of the responses are 404's, which is expected, and my code handles those responses without issue. My CPU usage stays pretty low (~5-10%)
However, sometimes the servers on the API's end start experiencing problems, and every request i make gets returned a 503 error. These are handled exactly the same way as 404's, they're ignored.
Yet with the 503's the CPU usage is huge, it stays at ~60% the entire time the 503's come in. The result is that when the API servers I'm speaking with have issues, my entire server goes down as well.
Why do 503 responses from the HTTP server cause such high CPU usage compared to 404's? Do the 503's reset the TCP connections in the connection pool?
Something very strange started happening on our production servers a day or two ago regarding a WCF Service we run there: it seems that something started rate limiting the process in question's CPU cycles to the amount of CPU cycles that would be available on one core, even though the load is spread across all cores (the process is not burning one core to 100% usage)
The Service is mostly just a CRUD (create, read, update, delete) service, with the exception of a few long running (can take up to 20 minutes) service calls that exist there. These long running service calls kicks of a simple Thread and returns void so not to make the Client application wait, or hold up the WCF connection:
// WCF Service Side
[OperationBehavior]
public void StartLongRunningProcess()
{
Thread workerThread = new Thread(DoWork);
workerThread.Start();
}
private void DoWork()
{
// Call SQL Stored proc
// Write the 100k+ records to new excel spreadsheet
// return (which kills off this thread)
}
Before the above call is kicked off, the service seems to respond as it should, Fetching data to display on the front-end quickly.
When you kick off the long running process, and the CPU usage goes to 100 / CPUCores, the front-end response gets slower and slower, and eventually wont accept any more WCF connections after a few minutes.
What I think is happening, is the long running process is using all the CPU cycles the OS is allowing, because something is rate limiting it, and WCF can't get a chance to accept the incoming connection, never mind execute the request.
At some point I started wondering if the Cluster our virtual servers run on is somehow doing this, but then we managed to reproduce this on our development machines with the client communicating to the service using the loopback address, so the hardware firewalls are not interfering with the network traffic either.
While testing this inside of VisualStudio, i managed to start 4 of these long running processes and with the debugger confirmed that all 4 are executing simultaneously, in different threads (by checking Thread.CurrentThread.ManagedThreadId), but still only using 100 / CPUCores worth of CPU cycles in total.
On the production server, it doesn't go over 25% CPU usage (4 cores), when we doubled the CPU cores to 8, it doesn't go over 12.5% CPU usage.
Our development machines have 8 cores, and also wont go over 12.5% CPU usage.
Other things worth mentioning about the service
Its a Windows Service
Its running inside of a TopShelf host
The problem didn't start after a deployment (of our service anyway)
Production server is running Windows Server 2008 R2 Datacenter
Dev Machines are running Windows 7 Enterprise
Things that we have checked, double checked, and tried:
Changing the process' priority up to High from Normal
Checked that the processor affinity for the process is not limiting to a specific core
The [ServiceBehavior] Attribute is set to ConcurrencyMode = ConcurrencyMode.Multiple
Incoming WCF Service calls are executing on different threads
Remove TopShelf from the equation hosting the WCF service in just a console application
Set the WCF Service throttling values: <serviceThrottling maxConcurrentCalls="1000" maxConcurrentInstances="1000" maxConcurrentSessions="1000" />
Any ideas on what could be causing this?
There must be a shared resource that only allows a single thread to access it at a time. This would effectively only allow one thread at a time to run, and create exactly the situation you have.
Processor affinity masks are the only way to limit a process to a single CPU, and if you did this you would see one CPU pinned and all the others idle (which is not your situation).
We use a tool called LeanSentry that is very good at identifying these kinds of problems. It will attach itself to IIS as a debugger and capture stack dumps of all executing processes, then tell you if most of your threads are blocked in the same spot. There is a free trial that would be long enough for you to figure this out.
The CPU usage looks like a lock on a table in the SQL Database to me. I would use the SQL management studio to analyze the statements see if it can confirm that.
Also you indicated that you call a stored procedure might want to have it look at that as well.
This all just looks like a database issue to me
We have an application (a meta-search engine) that must make 50 - 250 outbound HTTP connections in response to a user action, frequently.
The way we do this is by creating a bunch of HttpWebRequests and running them asynchronously using Action.BeginInvoke. This obviously uses the ThreadPool to launch the web requests, which run synchronously on their own thread. Note that it is currently this way as this was originally a .NET 2.0 app and there was no TPL to speak of.
Using ETW (our event sources combined with the .NET framework and kernal ones) and NetMon is that while the thread pool can start 200 threads running our code in about 300ms (so, no threadpool exhaustion issues here), it takes up a variable amount of time, sometimes up to 10 - 15 seconds for the Windows kernel to make all the TCP connections that have been queued up.
This is very obvious in NetMon - you see around 60 - 100 TCP connections open (SYN) immediately (the number varies, but it's never more then around 120), then the rest trickle in over a period of time. It's as if the connections are being queued somewhere, but I don't know where and I don't know how to tune this to we can perform more concurrent outgoing connections. Perfmon Outbound Connection Queue stays at 0 but in the Connections Established counter you can see an initial spike of connections then a gradual increase as the rest filter through.
It does appear that latency to the endpoints to which we are connecting play a part, as running the code close to the endpoints that it connects to doesn't show the problem as significantly.
I've taken comprehensive ETW traces but there is no decent documentation on many of the Microsoft providers, which would be a help I'm sure.
Any advice to work around this or advice on tuning windows for a large amount of outgoing connections would be great. The platform is Win7 (dev) and Win2k8R2 (prod).
It looks like slow DNS queries are the culprit here. Looking at the ETW provider "Microsoft-Windows-Networking-Correlation", I can trace the network call from inception to connection and note that many connections are taking > 1 second at the DNS resolver (Microsoft-Windows-RPC).
It appears our local DNS server is slow/can't handle the load we are throwing at it and isn't caching aggressively. Production wasn't showing as severe symptoms as the prod DNS servers do everything right.
I am using SOAP in C# .Net 3.5 to consume a web service, from a video game company. I am having lots of SOAP Exceptions with the error "Operation Timed Out"
While one process is timing out, others fly by with no problems. I would like to rule out a problem on my end, but I have no idea where to begin. My timeout is 5 minutes. For every 5,000 requests, maybe 500 fail.
Anyone have some advice for diagnosing web services failures? The web service owner will probably give no support to helping me on this, as it's a free service.
Thanks
I've had to do a lot of debugging connecting to a SOAP Service using PHP and timeouts are the worst problem. Normally the problem is the 'client' doesn't have a high enough timeout and bombs after something like 30s.
I test making the calls using SoapUI. I keep using a higher client-side timeout using that until I find something that works. Once I find that out I use the newly found time to my client and re-test.
Your only solution may be to make sure your 'clients' have a high enough timeout that will work for everything. 5 minutes should be fine for most of your server-side timeouts.
OK this is a huge question and there is a lot that it could be.
Have you tackled HTTP two connection limit? http://madskristensen.net/post/Optimize-HTTP-requests-and-web-service-calls.aspx
Have you got enough IO threads to cater for the load? Use the performance monitoring to check this for your App Pool - I think there is a IO threads counter. A quick google turned this up - http://www.guidanceshare.com/wiki/ASP.NET_2.0_Performance_Guidelines_-_Threading
Are you exhausting your bandwidth? Use performance monitoring again to check the usage of your network card.
This is a really hard subject to broach textually, as it so dependent on environment but I hop these might help.
This also looks interesting - http://software.intel.com/en-us/articles/how-to-tune-the-machineconfig-file-on-the-aspnet-platform/