Parallel.ForEach keeps spawning new threads

Parallel.ForEach keeps spawning new threads - c#

While I was using Parallel.ForEach in my program, I found that some threads never seemed to finish. In fact, it kept spawning new threads over and over, a behaviour that I wasn't expecting and definitely don't want.
I was able to reproduce this behaviour with the following code which, just like my 'real' program, both uses processor and memory a lot (.NET 4.0 code):
public class Node
{
public Node Previous { get; private set; }
public Node(Node previous)
{
Previous = previous;
}
}
public class Program
{
public static void Main(string[] args)
{
DateTime startMoment = DateTime.Now;
int concurrentThreads = 0;
var jobs = Enumerable.Range(0, 2000);
Parallel.ForEach(jobs, delegate(int jobNr)
{
Interlocked.Increment(ref concurrentThreads);
int heavyness = jobNr % 9;
//Give the processor and the garbage collector something to do...
List<Node> nodes = new List<Node>();
Node current = null;
for (int y = 0; y < 1024 * 1024 * heavyness; y++)
{
current = new Node(current);
nodes.Add(current);
}
TimeSpan elapsed = DateTime.Now - startMoment;
int threadsRemaining = Interlocked.Decrement(ref concurrentThreads);
Console.WriteLine("[{0:mm\\:ss}] Job {1,4} complete. {2} threads remaining.",
elapsed, jobNr, threadsRemaining);
});
}
}
When run on my quad-core, it initially starts of with 4 concurrent threads, just as you would expect. However, over time more and more threads are being created. Eventually, this program then throws an OutOfMemoryException:
[00:00] Job 0 complete. 3 threads remaining.
[00:01] Job 1 complete. 4 threads remaining.
[00:01] Job 2 complete. 4 threads remaining.
[00:02] Job 3 complete. 4 threads remaining.
[00:05] Job 9 complete. 5 threads remaining.
[00:05] Job 4 complete. 5 threads remaining.
[00:05] Job 5 complete. 5 threads remaining.
[00:05] Job 10 complete. 5 threads remaining.
[00:08] Job 11 complete. 5 threads remaining.
[00:08] Job 6 complete. 5 threads remaining.
...
[00:55] Job 67 complete. 7 threads remaining.
[00:56] Job 81 complete. 8 threads remaining.
...
[01:54] Job 107 complete. 11 threads remaining.
[02:00] Job 121 complete. 12 threads remaining.
..
[02:55] Job 115 complete. 19 threads remaining.
[03:02] Job 166 complete. 21 threads remaining.
...
[03:41] Job 113 complete. 28 threads remaining.
<OutOfMemoryException>
The memory usage graph for the experiment above is as follows:
(The screenshot is in Dutch; the top part represents processor usage, the bottom part memory usage.) As you can see, it looks like a new thread is being spawned almost every time the garbage collector gets in the way (as can be seen in the dips of memory usage).
Can anyone explain why this is happening, and what I can do about it? I just want .NET to stop spawning new threads, and finish the existing threads first...

You can limit the maximum number of threads that get created by specifying a ParallelOptions instance with the MaxDegreeOfParallelism property set:
var jobs = Enumerable.Range(0, 2000);
ParallelOptions po = new ParallelOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount
};
Parallel.ForEach(jobs, po, jobNr =>
{
// ...
});
As to why you're getting the behaviour you're observing: The TPL (which underlies PLINQ) is, by default, at liberty to guess the optimal number of threads to use. Whenever a parallel task blocks, the task scheduler may create a new thread in order to maintain progress. In your case, the blocking might be happening implicitly; for example, through the Console.WriteLine call, or (as you observed) during garbage collection.
From Concurrency Levels Tuning with Task Parallel Library (How Many Threads to Use?):
Since the TPL default policy is to use one thread per processor, we can conclude that TPL initially assumes that the workload of a task is ~100% working and 0% waiting, and if the initial assumption fails and the task enters a waiting state (i.e. starts blocking) - TPL with take the liberty to add threads as appropriate.

You should probably read a bit about the how the task scheduler works.
Parallel Programming with Microsoft .NET - Parallel Tasks
(latter half of the page)
"The .NET thread pool automatically manages the number of worker
threads in the pool. It adds and removes threads according to built-in
heuristics. The .NET thread pool has two main mechanisms for injecting
threads: a starvation-avoidance mechanism that adds worker threads if
it sees no progress being made on queued items and a hill-climbing
heuristic that tries to maximize throughput while using as few threads
as possible.
The goal of starvation avoidance is to prevent deadlock. This kind of
deadlock can occur when a worker thread waits for a synchronization
event that can only be satisfied by a work item that is still pending
in the thread pool's global or local queues. If there were a fixed
number of worker threads, and all of those threads were similarly
blocked, the system would be unable to ever make further progress.
Adding a new worker thread resolves the problem.
A goal of the hill-climbing heuristic is to improve the utilization of
cores when threads are blocked by I/O or other wait conditions that
stall the processor. By default, the managed thread pool has one
worker thread per core. If one of these worker threads becomes
blocked, there's a chance that a core might be underutilized,
depending on the computer's overall workload. The thread injection
logic doesn't distinguish between a thread that's blocked and a thread
that's performing a lengthy, processor-intensive operation. Therefore,
whenever the thread pool's global or local queues contain pending work
items, active work items that take a long time to run (more than a
half second) can trigger the creation of new thread pool worker
threads."
You can mark a task as LongRunning but this has the side effect of allocating a thread for it from outside the thread pool which means that the task cannot be inlined.
Remember that the ParallelFor treats the work it is given as blocks so even if the work in one loop is fairly small the overall work done by the task invoked by the look may appear longer to the scheduler.
Most calls to the GC in and of them selves aren't blocking (it runs on a separate thread) but if you wait for GC to complete then this does block. Remember also that the GC is rearranging memory so this may have some side effects (and blocking) if you are trying to allocate memory while running GC. I don't have specifics here but I know the PPL has some memory allocation features specifically for concurrent memory management for this reason.
Looking at your code's output it seems that things are running for many seconds. So I'm not surprised that you are seeing thread injection. However I seem to remember that the default thread pool size is roughly 30 threads (probably depending on the number of cores on your system). A thread takes up roughly a MB of memory before your code allocates any more so I'm not clear why you could get an out of memory exception here.

I've posted the follow-up question "How to count the amount of concurrent threads in .NET application?"
If to count the threads directly, their number in Parallel.For() mostly ((very rarely and insignificantly decreasing) only increases and is not releleased after loop completion.
Checked this in both Release and Debug mode, with
ParallelOptions po = new ParallelOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount
};
and without
The digits vary but conclusions are the same.
Here is the ready code I was using, if someone wants to play with:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace Edit4Posting
{
public class Node
{
public Node Previous { get; private set; }
public Node(Node previous)
{
Previous = previous;
}
}
public class Edit4Posting
{
public static void Main(string[] args)
{
int concurrentThreads = 0;
int directThreadsCount = 0;
int diagThreadCount = 0;
var jobs = Enumerable.Range(0, 160);
ParallelOptions po = new ParallelOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount
};
Parallel.ForEach(jobs, po, delegate(int jobNr)
//Parallel.ForEach(jobs, delegate(int jobNr)
{
int threadsRemaining = Interlocked.Increment(ref concurrentThreads);
int heavyness = jobNr % 9;
//Give the processor and the garbage collector something to do...
List<Node> nodes = new List<Node>();
Node current = null;
//for (int y = 0; y < 1024 * 1024 * heavyness; y++)
for (int y = 0; y < 1024 * 24 * heavyness; y++)
{
current = new Node(current);
nodes.Add(current);
}
//*******************************
directThreadsCount = Process.GetCurrentProcess().Threads.Count;
//*******************************
threadsRemaining = Interlocked.Decrement(ref concurrentThreads);
Console.WriteLine("[Job {0} complete. {1} threads remaining but directThreadsCount == {2}",
jobNr, threadsRemaining, directThreadsCount);
});
Console.WriteLine("FINISHED");
Console.ReadLine();
}
}
}

Related

Multi-threading potentially long running operations

I am writing a Windows Service. I have a 'backlog' (in SQL) of records that have to be processed by the service. The backlog might also be empty. The record processing is a potentially very long running operation (3+ minutes).
I have a class and method in it which would go to the SQL, choose a record and process it, if there are any records to process. Then the method will exist and that's it. Note: I can't know in advance which records will be processed - the class method decides this as part of its logic.
I want to achieve parallel processing. I want to have X number of workers (where X is the optimal for the host PC) at any time. While the backlog is empty, those workers finish their jobs and exit pretty quickly (~50-100ms, maybe). I want any 'freed' worker to start over again (i.e. re-run).
I have done some reading and I deduct that ThreadPool is not a good option for long-running operations. The .net 4.0+ parallel library is not a good option either, as I don't want to wait all workers to finish and I don't want to predefine/declare in advance the tasks.
In layman terms I want to have X workers who query the data source for items and when some of them find such - operate on it, the rest would continue to look for newly pushed items into the backlog.
What would be the best approach? I think I will have to manage the threads entirely by myself? i.e. first step - determine the optimum number of threads (perhaps by checking the Environment.ProcessorCount) and then start the X threads. Monitor for IsAlive on each thread and restart it? This seems awfully unprofessional.
Any suggestions?

You can start one task per core,As tasks finish start new ones.You can use numOfThreads depending on ProcessorCount or specific number
int numOfThreads = System.Environment.ProcessorCount;
// int numOfThreads = X;
for(int i =0; i< numOfThreads; i++)
task.Add(Task.Factory.StartNew(()=> {});
while(task.count>0) //wait for task to finish
{
int index = Task.WaitAny(tasks.ToArray());
tasks.RemoveAt(index);
if(incomplete work)
task.Add(Task.Factory.StartNew()=> {....});
}
or
var options = new ParallelOptions();
options.MaxDegreeOfParllelism = System.Environment.ProcessorCount;
Parallel.For(0,N,options, (i) => {/*long running computattion*/};
or
You can Implement Producer-Coustomer pattern with BlockingCollection
This topic is excellently taught by Dr.Joe Hummel on his Pluralsight course "Async and parallel programming: Application design "

Consider using ActionBlock<T> from TPL.DataFlow library. It can be configured to process concurrently multiple messages using all available CPU cores.
ActionBlock<QueueItem> _processor;
Task _completionTask;
bool _done;
async Task ReadQueueAsync(int pollingInterval)
{
while (!_done)
{
// Get a list of items to process from SQL database
var list = ...;
// Schedule the work
foreach(var item in list)
{
_processor.Post(item);
}
// Give SQL server time to re-fill the queue
await Task.Delay(pollingInterval);
}
// Signal the processor that we are done
_processor.Complete();
}
void ProcessItem(QueueItem item)
{
// Do your work here
}
void Setup()
{
// Configure action block to process items concurrently
// using all available CPU cores
_processor= new ActionBlock<QueueItem>(new Action<QueueItem>(ProcessItem),
new ExecutionDataFlowBlock{MaxDegreeOfParallelism = DataFlowBlockOptions.Unbounded});
_done = false;
var queueReaderTask = ReadQueueAsync(QUEUE_POLL_INTERVAL);
_completionTask = Task.WhenAll(queueReaderTask, _processor.Completion);
}
void Complete()
{
_done = true;
_completionTask.Wait();
}

Per MaxDegreeOfParallelism's documentation: "Generally, you do not need to modify this setting. However, you may choose to set it explicitly in advanced usage scenarios such as these:
When you know that a particular algorithm you're using won't scale
beyond a certain number of cores. You can set the property to avoid
wasting cycles on additional cores.
When you're running multiple algorithms concurrently and want to
manually define how much of the system each algorithm can utilize.
You can set a MaxDegreeOfParallelism value for each.
When the thread pool's heuristics is unable to determine the right
number of threads to use and could end up injecting too many
threads. For example, in long-running loop body iterations, the
thread pool might not be able to tell the difference between
reasonable progress or livelock or deadlock, and might not be able to reclaim threads that were added to improve performance. In this
case, you can set the property to ensure that you don't use more
than a reasonable number of threads."
If you do not have an advanced usage scenario like the 3 cases above, you may want to hand your list of items or tasks to be run to the Task Parallel Library and let the framework handle the processor count.
List<InfoObject> infoList = GetInfo();
ConcurrentQueue<ResultObject> output = new ConcurrentQueue<ResultObject>();
await Task.Run(() =>
{
Parallel.Foreach<InfoObject>(infoList, (item) =>
{
ResultObject result = ProcessInfo(item);
output.Add(result);
});
});
foreach(var resultObj in output)
{
ReportOnResultObject(resultObj);
}
OR
List<InfoObject> infoList = GetInfo();
List<Task<ResultObject>> tasks = new List<Task<ResultObject>>();
foreach (var item in infoList)
{
tasks.Add(Task.Run(() => ProcessInfo(item)));
}
var results = await Task.WhenAll(tasks);
foreach(var resultObj in results)
{
ReportOnResultObject(resultObj);
}
H/T to IAmTimCorey tutorials:
https://www.youtube.com/watch?v=2moh18sh5p4
https://www.youtube.com/watch?v=ZTKGRJy5P2M

Why is the Completed callback from SocketAsyncEventArgs frequently executed in newly created threads instead of using a bounded thread pool?

I have a simple client application that receives byte buffers from the network with a low throughput. Here is the code:
private static readonly HashSet<int> _capturedThreadIds = new HashSet<int>();
private static void RunClient(Socket socket)
{
var e = new SocketAsyncEventArgs();
e.SetBuffer(new byte[10000], 0, 10000);
e.Completed += SocketAsyncEventsArgsCompleted;
Receive(socket, e);
}
private static void Receive(Socket socket, SocketAsyncEventArgs e)
{
var isAsynchronous = socket.ReceiveAsync(e);
if (!isAsynchronous)
SocketAsyncEventsArgsCompleted(socket, e);
}
private static void SocketAsyncEventsArgsCompleted(object sender, SocketAsyncEventArgs e)
{
if (e.LastOperation != SocketAsyncOperation.Receive || e.SocketError != SocketError.Success || e.BytesTransferred <= 0)
{
Console.WriteLine("Operation: {0}, Error: {1}, BytesTransferred: {2}", e.LastOperation, e.SocketError, e.BytesTransferred);
return;
}
var thread = Thread.CurrentThread;
if (_capturedThreadIds.Add(thread.ManagedThreadId))
Console.WriteLine("New thread, ManagedId: " + thread.ManagedThreadId + ", NativeId: " + GetCurrentThreadId());
//Console.WriteLine(e.BytesTransferred);
Receive((Socket)sender, e);
}
The threading behavior of the application is quite curious:
The SocketAsyncEventsArgsCompleted method is frequently run in new threads. I would have expected that after some time no new thread would be created. I would have expected the threads to be reused, because of the thread pool (or IOCP thread pool) and because the throughput is very stable.
The number of threads stays low, but I can see in the process explorer that threads are frequently created and destroyed. Likewise, I would not have expected threads to be created or destroyed.
Can you explain the application behavior?
Edit: The "low" throughput is 20 messages per second (roughly 200 KB/s). If I increase the throughput to more than 1000 messages per second (50 MB/s), the application behavior does not change.

The low application throughput itself cannot explain the thread creation and destruction. The socket receives 20 messages per seconds, which is more than enough to keep a thread alive (the waiting threads are being destroyed after spending 10 seconds idle).
This problem is related to the thread pool thread injection, i.e. the threads creation and destruction strategy. Thread pool threads are regularly injected and destroyed in order to measure the impact of new threads on the thread pool throughput.
This is called thread probing. It is clearly explained in the Channel 9 video CLR 4 - Inside the Thread Pool (jump to 26:30).
It seems like thread probing is always done with newly created threads instead of moving a thread in and out of the pool. I suppose it works better like this for most applications because it avoids to keep an unused thread alive.

From MSDN
Beginning with the .NET Framework 4, the thread pool creates and
destroys worker threads in order to optimize throughput, which is
defined as the number of tasks that complete per unit of time. Too few
threads might not make optimal use of available resources, whereas too
many threads could increase resource contention.
Note
When demand is low, the actual number of thread pool threads can
fall below the minimum values.
Basically it sounds like your low throughput is causing the thread pool to destroy threads since they are not required, and are just sat taking up resources. I wouldn't worry about it. As MS explicitly state:
In most cases the thread pool will perform better with its own
algorithm for allocating threads.
If you're really bothered, you could always poll ThreadPool.GetAvailableThreads() to watch the pool, and see how different network throughputs affect it.

Thread Pool and it thread provide

My code is
static void Main(string[] args)
{
for (int i = 0; i < 100; i++)
{
ThreadPool.QueueUserWorkItem(y =>
{
Console.WriteLine(Thread.CurrentThread.ManagedThreadId);
Thread.Sleep(3000);
});
}
Console.Read();
}
When I start program and look at the sos.dll I can see that every time thread pool provide me 4-5 threads. Hereupon occurs delay because pool don't give more threads.
Why is this happening?

ThreadPool Class:
There is one thread pool per process. Beginning with the .NET Framework 4, the default size of the thread pool for a process depends on several factors, such as the size of the virtual address space. A process can call the GetMaxThreads method to determine the number of threads. The number of threads in the thread pool can be changed by using the SetMaxThreads method. Each thread uses the default stack size and runs at the default priority.
As an additional note, depending on system resources (like CPU cores, RAM, etc.), more threads may not make your application run faster.

Thread count growth when using Task Parallel Library

I'm using C# TPL and I'm having a problem with a producer/consumer code... for some reason, TPL doesn't reuse threads and keeps creating new ones without stopping
I made a simple example to demonstrate this behavior:
class Program
{
static BlockingCollection<int> m_Buffer = new BlockingCollection<int>(1);
static CancellationTokenSource m_Cts = new CancellationTokenSource();
static void Producer()
{
try
{
while (!m_Cts.IsCancellationRequested)
{
Console.WriteLine("Enqueuing job");
m_Buffer.Add(0);
Thread.Sleep(1000);
}
}
finally
{
m_Buffer.CompleteAdding();
}
}
static void Consumer()
{
Parallel.ForEach(m_Buffer.GetConsumingEnumerable(), Run);
}
static void Run(int i)
{
Console.WriteLine
("Job Processed\tThread: {0}\tProcess Thread Count: {1}",
Thread.CurrentThread.ManagedThreadId,
Process.GetCurrentProcess().Threads.Count);
}
static void Main(string[] args)
{
Task producer = new Task(Producer);
Task consumer = new Task(Consumer);
producer.Start();
consumer.Start();
Console.ReadKey();
m_Cts.Cancel();
Task.WaitAll(producer, consumer);
}
}
This code creates 2 tasks, producer and consumer. Produces adds 1 work item every second, and Consumer only prints out a string with information. I would assume that 1 consumer thread is enough in this situation, because tasks are processed much faster than they are being added to the queue, but what actually happens is that every second number of threads in the process grows by 1... as if TPL is creating new thread for every item
after trying to understand what's happening I also noticed another thing: even though BlockingCollection size is 1, after a while Consumer starts getting called in bursts, for example, this is how it starts:
Enqueuing job
Job Processed Thread: 4 Process Thread Count: 9
Enqueuing job
Job Processed Thread: 6 Process Thread Count: 9
Enqueuing job
Job Processed Thread: 5 Process Thread Count: 10
Enqueuing job
Job Processed Thread: 4 Process Thread Count: 10
Enqueuing job
Job Processed Thread: 6 Process Thread Count: 11
and this is how it's processing items less than a minute later:
Enqueuing job
Job Processed Thread: 25 Process Thread Count: 52
Enqueuing job
Enqueuing job
Job Processed Thread: 5 Process Thread Count: 54
Job Processed Thread: 5 Process Thread Count: 54
and because threads get disposed after finishing Parallel.ForEach loop (I don't show it in this example, but it was in the real project) I assumed that it has something to do with ForEach specifically... I found this artice http://reedcopsey.com/2010/01/26/parallelism-in-net-part-5-partitioning-of-work/, and I thought that my problem was caused by this default partitioner, so I took custom partitioner from TPL Examples that is feeding Consumer threads item one by one, and although it fixed the order of execution (got rid of delay)...
Enqueuing job
Job Processed Thread: 71 Process Thread Count: 140
Enqueuing job
Job Processed Thread: 12 Process Thread Count: 141
Enqueuing job
Job Processed Thread: 72 Process Thread Count: 142
Enqueuing job
Job Processed Thread: 38 Process Thread Count: 143
Enqueuing job
Job Processed Thread: 73 Process Thread Count: 143
Enqueuing job
Job Processed Thread: 21 Process Thread Count: 144
Enqueuing job
Job Processed Thread: 74 Process Thread Count: 145
...it didn't stop threads from growing
I know about ParallelOptions.MaxDegreeOfParallelism, but I still want to understand what's happening with TPL and why it creates hundreds of threads for no reason
in my project I a code that has to run for hours and read new data from database, put it into a BlockingCollections and have has data processed by other code, there's 1 new item about every 5 seconds and it takes from several milliseconds to almost a minute to process it, and after running for about 10 minutes, thread count reached over a 1000 threads

There are two things that together cause this behavior:
ThreadPool tries to use the optimal number of threads for your situation. But if one of the threads in the pool blocks, the pool sees this as if that thread wasn't doing any useful work and so it tends to create another thread soon after that. What this means is that if you have a lot of blocking, ThreadPool is really bad at guessing the optimal number of threads and it tends to create new threads until it reaches the limit.
Parallel.ForEach() trusts the ThreadPool to guess the correct number of threads, unless you set the maximum number of threads explicitly. Parallel.ForEach() was also primarily meant for bounded collections, not streams of data.
When you combine these two things with GetConsumingEnumerable(), what you get is that Parallel.ForEach() creates threads that are almost always blocked. The ThreadPool sees this, and, to try to keep the CPU utilized, creates more and more threads.
The correct solution here is to set MaxDegreeOfParallelism. If your computations are CPU-bound, the best value is most likely Environment.ProcessorCount. If they are IO-bound, you will have to find out the best value experimentally.
Another option, if you can use .Net 4.5, is to use TPL Dataflow. This library was made specifically to process streams of data, like you have, so it doesn't have the problems your code has. It's actually even better than that and doesn't use any threads at all when it's not processing anything currently.
Note: There is also a good reason why is a new thread created for each new item, but explaining that would require me to explain how Parallel.ForEach() works in more detail and I feel that's not necessary here.

Why Thread.Sleep() is so CPU intensive?

I have an ASP.NET page with this pseduo code:
while (read)
{
Response.OutputStream.Write(buffer, 0, buffer.Length);
Response.Flush();
}
Any client who requests this page will start to download a binary file. Everything is OK at this point but clients had no limit in download speed so changed the above code to this:
while (read)
{
Response.OutputStream.Write(buffer, 0, buffer.Length);
Response.Flush();
Thread.Sleep(500);
}
Speed problem is solved now, but under test with 100 concurrent clients who connect one after another (3 seconds lag between each new connection) the CPU usage increases when the number of clients increases and when there are 70 ~ 80 concurrent clients CPU reaches 100% and any new connection is refused. Numbers may be different on other machines but the question is why Thread.Sleep() is so CPU intensive and is there any way to speed done the client without CPU rising ?
I can do it at IIS level but I need more control from inside of my application.

Let's take a look at whether Michael's answer seems reasonable.
Now, Michael wisely points out that Thread.Sleep(500) shouldn't cost much in the way of CPU. That's all well and good in theory, but let's see if that pans out in practice.
static void Main(string[] args) {
for(int i = 0; i != 10000; ++i)
{
Thread.Sleep(500);
}
}
Running this, the CPU use of the application hovers around the 0% mark.
Michael also points out that since all the threads that ASP.NET has to use are sleeping, it will have to spawn new threads, and offers that this is expensive. Let's try not sleeping, but doing lots of spawning:
static void Main(string[] args) {
for(int i = 0; i != 10000; ++i)
{
new Thread(o => {}).Start();
}
}
We create lots of threads, but they just execute a null operation. That uses a lot of CPU, even though the threads aren't doing anything.
The total number of threads never gets very high though, because each lives for such a short time. Lets combine the two:
static void Main(string[] args) {
for(int i = 0; i != 10000; ++i)
{
new Thread(o => {Thread.Sleep(500);}).Start();
}
}
Adding this operation that we have shown to be low in CPU use to each thread increases CPU use even more, as the threads mount up. If I run it in a debugger it pushes up to near 100% CPU. If I run it outside of a debugger, it performs a bit better, but only because it throws an out of memory exception before it gets a chance to hit 100%.
So, it isn't Thread.Sleep itself that is the problem, but the side-effect that having all available threads sleep forces more and more threads to be created to handle other work, just as Michael said.

Just a guess:
I don't think it's Thread.Sleep() that's tying up the CPU - it's the fact that you're causing threads to be tied up responding to a request for so long, and the system needs to spin up new threads (and other resources) to respond to new requests since those sleeping threads are no longer available in the thread pool.

Rather than an ASP.NET page you should implement an IHttpAsyncHandler. ASP.NET page code puts many things between your code and the browser that would not be appropriate for transferring binary files. Also, since you're attempting to perform rate limitation, you should use asynchronous code to limit resource usage, which would be difficult in an ASP.NET page.
Creating an IHttpAsyncHandler is fairly simple. Just trigger some asynchronous operations in the BeginProcessRequest method, and don't forget to properly close the context to show you have reached the end of the file. IIS won't be able to close it for you here.
The following is my really bad example of how to perform an an asynchronous operation consisting of a series of steps, counting from 0 to 10, each performed at a 500ms interval.
using System;
using System.Threading;
namespace ConsoleApplication1 {
class Program {
static void Main() {
// Create IO instances
EventWaitHandle WaitHandle = new EventWaitHandle(false, EventResetMode.AutoReset); // We don't actually fire this event, just need a ref
EventWaitHandle StopWaitHandle = new EventWaitHandle(false, EventResetMode.AutoReset);
int Counter = 0;
WaitOrTimerCallback AsyncIOMethod = (s, t) => { };
AsyncIOMethod = (s, t) => {
// Handle IO step
Counter++;
Console.WriteLine(Counter);
if (Counter >= 10)
// Counter has reaced 10 so we stop
StopWaitHandle.Set();
else
// Register the next step in the thread pool
ThreadPool.RegisterWaitForSingleObject(WaitHandle, AsyncIOMethod, null, 500, true);
};
// Do initial IO
Console.WriteLine(Counter);
// Register the first step in the thread pool
ThreadPool.RegisterWaitForSingleObject(WaitHandle, AsyncIOMethod, null, 500, true);
// We force the main thread to wait here so that the demo doesn't close instantly
StopWaitHandle.WaitOne();
}
}
}
You'll also need to register your IHttpAsyncHandler implementation with IIS in whichever way is appropriate for your situation.

Its because the thread gets a priority boost every time it yields its time slice. Avoid calling sleep often ( particularly with low values ).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.