IIS web server and thread pool issues

IIS web server and thread pool issues - c#

Question is related ASP.NET 4.0 and IIS based azure cloud service:
need to know right number of IOCP threads to set for production web service where we make 10-20K/sec remote calls
Also need to know right number of Worker threads to set for production web service...specially to handle 10-20K/sec API calls...specially in bursts
Basically, I am facing issue that each of my cloud service VM should handle 10-20K requests/sec but it is not able to do so due to thread pool issue w.r.t. asp.net
my prod service does nothing but get data from redis and simply return

Assuming code is efficient and there is enough hardware i.e. there are no issues related to memory, cpu and n/w:
1. You should try to keep IOCP to minimal numbers 50-100
2. You should try to keep CPU threads to high to handle bursts of requests
I am not sure if it's a good idea to keep 2-5K active threads to cater to 10-20K requests/sec

Related

Hundreds of threads from Worker Role connecting to SignalR on Web Role

My system has a Cloud Service with a Worker Role that reads messages from a queue (Azure Service Bus) and spawns a thread that uses the C# SignalR client to connect to a Cloud Service running a Web Role hosting the SignalR Hub. The worker thread runs for about 5 minutes doing various things including intermittently sending messages to the Hub - maybe 25 messages total. I am scaling out with Azure Service Bus topics - the default of 5. The Cloud Services are separate but reside in the same Virtual Network - the Worker Role points to the load balancer probes for the Web Role (but right now I am only running a single instance of each Role).
I am trying to determine the capacity of both the Worker Role (with the SignalR clients) and the Web Role (hosting the SIgnalR hub).
I can run 200 concurrent threads on the Worker Role with each connecting, exchanging messages, and disconnecting cleanly. Neither Role experiences more than a 35% CPU spike during the testing. SignalR Performance counters all look great - there are no errors, no SSE or LP connections, and no scaleout queueing or scaleout errors.
When I try 300, suddenly all but 1 of my threads on the Worker Role cannot connect, and experience TimeoutExceptions that read "Transport timed out trying to connect" issue. I enabled tracing on the C# client in the Worker Role and I see that WebSockets, SSE, and LP all fail (Auto: Failed to connect to using transport webSockets/serverSideEvents/longPolling).
I am hoping to understand if:
a) my expectations are off - that I expect that I should be able to have more than 200+ concurrent connections from my Worker Role to my WebRole,
b) are the IIS settings for a Web Role adequate out of the box? Note that I have applied the SignalR performance changes supplied in the Wiki
c) are there Worker Role configurations / limitations with the number of concurrent connections I can make to a single source? Note that I applied the system.net configuration to allow a max of 1000.
d) is the type of Cloud Service size inhibiting me in any way? Both are set to "Medium" size which is 2 cores and 3.5 GB. Am I short-changing anything by stay small? The idea was to find the limits of this size server and then be able to apply more instances in real-time as needed.
It should be stated that if I add instances, I can get past this limitation. But I want to understand why my current bottleneck is 200.
Any ideas or comments are welcome. I'm kind of stuck.

Discrete .NET middleware processor vs spawning a new process from IIS

I have a 4 tier .NET application which consists of a
Silverlight 5 Client
MVC4 Web API Controller (Supplying data to the SL5 Client)
Windows Service - responsible for majority of data processing.
Oracle DB storage.
The workflow is simple: SL5 client sends a request to the rest service, the rest service simply stores it in the DB.
The windows service, while periodically polling the DB for new records, detects the new records and attempts to process them accordingly. Once finished it updates the records and their status in the DB.
In the meantime the SL5 Client also periodically polls the DB to see if the records have been processed. When they are, the result is retrieved and rendered on the screen.
So the question here is the following:
Is there a difference between spawning the same processing code (currently in the windows service) in a new discrete process (right out of the Web API Controller), vs keeping it as is in the windows service?
Aside from removing the constant DB polling that happens in the windows service, it simplifies processing greatly because it can be done on a per-request basis as the requests arrive from the client. But are there any other drawbacks? Perhaps server or other issues with IIS?

Yes there is a difference.
Windows services are the right tool for asynchronous processing. Operations can take a long time without producing strange effects. After all, it is a continuously running service.
IIS on the other hand, processes requests by using a thread pool. Long running tasks have the potential to exhaust that thread pool, so this may cause problems depending on the number of background tasks you start. Also, IIS makes no guarantees to keep long running tasks alive. If the web site is recycled, which happens regularly in a IIS default installation, your background task may die.

What could be rate limiting CPU cycles on my C# WCF Service?

Something very strange started happening on our production servers a day or two ago regarding a WCF Service we run there: it seems that something started rate limiting the process in question's CPU cycles to the amount of CPU cycles that would be available on one core, even though the load is spread across all cores (the process is not burning one core to 100% usage)
The Service is mostly just a CRUD (create, read, update, delete) service, with the exception of a few long running (can take up to 20 minutes) service calls that exist there. These long running service calls kicks of a simple Thread and returns void so not to make the Client application wait, or hold up the WCF connection:
// WCF Service Side
[OperationBehavior]
public void StartLongRunningProcess()
{
Thread workerThread = new Thread(DoWork);
workerThread.Start();
}
private void DoWork()
{
// Call SQL Stored proc
// Write the 100k+ records to new excel spreadsheet
// return (which kills off this thread)
}
Before the above call is kicked off, the service seems to respond as it should, Fetching data to display on the front-end quickly.
When you kick off the long running process, and the CPU usage goes to 100 / CPUCores, the front-end response gets slower and slower, and eventually wont accept any more WCF connections after a few minutes.
What I think is happening, is the long running process is using all the CPU cycles the OS is allowing, because something is rate limiting it, and WCF can't get a chance to accept the incoming connection, never mind execute the request.
At some point I started wondering if the Cluster our virtual servers run on is somehow doing this, but then we managed to reproduce this on our development machines with the client communicating to the service using the loopback address, so the hardware firewalls are not interfering with the network traffic either.
While testing this inside of VisualStudio, i managed to start 4 of these long running processes and with the debugger confirmed that all 4 are executing simultaneously, in different threads (by checking Thread.CurrentThread.ManagedThreadId), but still only using 100 / CPUCores worth of CPU cycles in total.
On the production server, it doesn't go over 25% CPU usage (4 cores), when we doubled the CPU cores to 8, it doesn't go over 12.5% CPU usage.
Our development machines have 8 cores, and also wont go over 12.5% CPU usage.
Other things worth mentioning about the service
Its a Windows Service
Its running inside of a TopShelf host
The problem didn't start after a deployment (of our service anyway)
Production server is running Windows Server 2008 R2 Datacenter
Dev Machines are running Windows 7 Enterprise
Things that we have checked, double checked, and tried:
Changing the process' priority up to High from Normal
Checked that the processor affinity for the process is not limiting to a specific core
The [ServiceBehavior] Attribute is set to ConcurrencyMode = ConcurrencyMode.Multiple
Incoming WCF Service calls are executing on different threads
Remove TopShelf from the equation hosting the WCF service in just a console application
Set the WCF Service throttling values: <serviceThrottling maxConcurrentCalls="1000" maxConcurrentInstances="1000" maxConcurrentSessions="1000" />
Any ideas on what could be causing this?

There must be a shared resource that only allows a single thread to access it at a time. This would effectively only allow one thread at a time to run, and create exactly the situation you have.
Processor affinity masks are the only way to limit a process to a single CPU, and if you did this you would see one CPU pinned and all the others idle (which is not your situation).
We use a tool called LeanSentry that is very good at identifying these kinds of problems. It will attach itself to IIS as a debugger and capture stack dumps of all executing processes, then tell you if most of your threads are blocked in the same spot. There is a free trial that would be long enough for you to figure this out.

The CPU usage looks like a lock on a table in the SQL Database to me. I would use the SQL management studio to analyze the statements see if it can confirm that.
Also you indicated that you call a stored procedure might want to have it look at that as well.
This all just looks like a database issue to me

Soft shutdown of a cloud role so it ignores new action requests?

I was wondering if it's possible to do a 'soft shutdown' or 'soft reboot' of a cloud service. In other words the server would refuse new incoming http requests (which come in through ASP.net controller actions), but would finish all existing requests that are in progress. After this happens the server would then shutdown or stop as normal.
Server Version
Azure OS Family 3 Release
Windows Server 2012
.NET 4.5
iis-8.0
asp.net 4.0
Usage Scenario
I need to ensure that any actions responding to remote http requests currently in progress finish before a server begins the process of shutting down or becoming unresponsive because of a staging to production swap.
I've done some research, but don't know if this is possible.
A hacky work around might be using a CloudConfigurationManager variable to initiate that an error 503 code should be returned on any incoming actions over http, but then I'd have to sit around and wait for a while without any way to verify that condition. At that point I could then stop the service or perform a swap.

See http://azure.microsoft.com/blog/2013/01/14/the-right-way-to-handle-azure-onstop-events/ for information on how to drain HTTP requests when a role is stopping (attaching image below, I don't know why the source uses an image instead of text...):
Also note that doing a VIP swap won't affect the role instances themselves or any TCP connections to the instances, so nothing should become unresponsive just because you do a VIP swap. Once you begin shutting down the staging deployment after a VIP swap that is when the code above will help drain the requests before actually shutting down.

ASP.NET Single Worker Thread? (In Memory Session)

I'm using in memory sessions in my ASP.NET MVC application which means that I can only have one single worker thread correct? Does this mean I have parallel processing in my application (think concurrent requests) or not? Does my application accept only 1 request at a time?
Edit: According to the IIS7 web site:
If the application uses in-process session variables,
the application will not function correctly, because the same user requests
are picked up by different worker processes that
do not share the same session details.
So this means in-memory session can only have 1 worker thread or not? See also here from the IIS7 forums.

Your application receives multiple requests at a time. Based on your edits and comments here you are looking for information on Web Gardens and sessions, as opposed to threads and session state.
Web Gardens use multiple processes and act like a load balancer when it comes to session state. Each process will have a separate in memory session store. IIS will send requests to any available process. Since the processes do not share session state then session usage will only really work if your session provider is shared between all the web garden processes.
Web Gardens only make sense if you use something like SQL Server for session state, and want to have affinity with a particular CPU/core. Since you can increase the number of threads this may be a more appropriate optimization for some users than using web gardens. However some applications may perform better with web gardens due to a particular work load or application characteristic. Use of web gardens in testing could also help work out some potential issues present under load balancing.
I believe it uses the .NET ThreadPool and has 20 threads by default. The each request coming into the server may be handled on a separate thread. The ASP.NET performance guidelines have some information on this topic.

In memory sessions means you should typically only have one front-end web server at a time, not a single worker thread :D
The reason being that any information stored in session on one machine is not available on the other. If you have two front-end web servers and your proxy or firewall does "load-balancing" whereby it will randomly assign requests to web servers, then you will have problems. That said, the problem is easily solved with so called "sticky sessions" where users are always sent to the same server.
-Oisin

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.