System.Threading.ThreadPool excluding a core?

System.Threading.ThreadPool excluding a core? - c#

I have a multi-core architecture computer that is executing processes using .Net 4.5.2 System.Threading.ThreadPool namespace. The processes are long duration math computations. These processes might execute for days and even weeks. I do not want to create my own Thread Pools. Using the System.Threading.ThreadPool namespace is very nice. However, on my multi-core architecture computer, the Thread Pool Manager is very greedy and load balances across all of the cores. Unfortunately, my processes on each core are also greedy. They want to monopolize the core and execute at 100% until it completes its assignment. I'm fine with this, except that the operating system freezes up. Literally, I can't move the mouse around and interact with the desktop. What I would like to do is reserve one of the Cores for the Operating System, so that the mouse and gui are still responsive. It seems logical to exclude a core (and its available threads) for the OS to operate.
Does anyone know how to accomplish this using System.Threading.ThreadPool?
****ANSWER****
To begin, my question is faulty. This was due to my inexperience with the subject. Second, if your google search brought you to this question, it means that your thinking is also faulty; evidenced by your google search words. But this is good. Now you can learn the proper way. And here it is.
The short answer to this question: System.Threading.ThreadPool cannot solve your issue.
A slightly better answer: The Task Parallel Library (TPL) is a set of public types and APIs in the System.Threading and System.Threading.Tasks namespaces in the .NET Framework 4.0. The TPL scales the degree of concurrency dynamically to efficiently use all the cores that are available. By using TPL, you can maximize the performance of your code while focusing on the work that your program is designed to accomplish.
Good luck and happy coding to you!

#ScottChamberlain: because the math operation relies on extracting a value from an external network resource. That resource may be blocked for reasons I can't control (perhaps another user is accessing it, blocking until they release it). – sapbucket
What you need to do is separate the getting data from the network resource from the processing of that resoruce. Use a BlockingCollection as a buffer to pipeline between your two parts.
BlockingCollection<YourData> _collection = new BlockingCollection<YourData>();
public void ProcessInParallel()
{
//This starts up your workers, it will create the number of cores you have - 1 workers.
var tasks = new List<Task>();
for(int i = 0; i < Environment.NumberOfLogicalCores - 1; i++)
{
var task = Task.Factory.StartNew(DataProcessorLoop, TaskCreationOptions.LongRunning);
tasks.Add(task);
}
//This function could be done in parallel too, _collection.Add is fine with multiple threads calling it.
foreach(YourData data in GetYourDataFromTheNetwork())
{
//Put data in to the collection, the first worker available will take it.
_collection.Add(data);
}
//Let the consumers know you will not be adding any more data.
_collection.CompleteAdding();
//Wait for all of the worker tasks to drain the collection and finish.
Task.WaitAll(tasks);
}
private void DataProcessorLoop()
{
//this will pull data from the collection of work to do, when there is no work to do
//it will block until more work shows up or CompleteAdding is called.
foreach(YourData data in _collection.GetConsumingEnumerable())
{
CrunchData(data)
}
}

Related

What's the operational difference between Parallel and Task in C#?

I work as the sole application developer within a database-focussed team. Recently, I've been trying to improve the efficiency of a process which my predecessor had prototyped. The best way to do this was to thread it. So this was my approach:
public void DoSomething()
{
Parallel.ForEach(rowCollection), (fr) =>
{
fr.Result = MyCleaningOperation();
});
}
Which functions fine, but causes errors. The errors are arising in a third-party tool the call is coding. This tool is supposed to be thread safe, but it looks strongly as though they're arising when two threads try and perform the same operation at the same time.
So I went back to the prototype. Previously I'd only looked at this to see how to talk to the third-party tool. But when I examined the called code, I discovered my predecessor had threaded it using Task and Action, operators with which I'm not familiar.
Action<object> MyCleaningOperation = (object obj) =>
{
// invoke the third-party tool.
}
public void Main()
{
Task[] taskCollection = new Task[1];
for (int i = 0; i < rowCollection.Length; i++)
{
taskCollection[i] = new Task(MyCleaningOperation, i);
}
foreach (var task in taskCollection)
{
task.Start();
}
try
{
Task.WaitAll(taskCollection);
}
catch (Exception ex)
{
throw ex;
}
}
Now, that's not great code but it is a prototype. Allegedly his prototype did not error and ran at a greater speed than mine. I cannot verify this because his prototype was dependent on a dev database that no longer exists.
I don't particularly want to go on a wild goose chase of trying out different kinds of threading in my app to see if some throw errors or not - they're intermittent so it would be a long drawn out process. More so because having read about Task I cannot see any reason why it would work more effectively than Parallel. And because I'm using a void function I cannot easily add an await to mimic the prototype operation.
So: is there an operational difference between the two? Or any other reason why one might cause a tool to trip up with multiple threads using the same resource and the other not?

Action<T> is a void-returning delegate which takes a T. It represents an operation which consumes a T, produces nothing, and is started when invoked.
Task<T> is what it says on the tin: it represents a job that is possibly not yet complete, and when it is complete, it provides a T to its completion.
So, let's make sure you've got it so far: what is the completion of a Task<T>?
Don't read on until you've sussed it out.
.
.
.
.
.
The completion of a task is an action. A task produces a T in the future; an action performs an action on that T when it is available.
All right, so then what is a Task, no T? A task that does not produce a value when it completes. What's the completion of a Task? Plainly an Action.
How can we describe the task performed by a Task then? It does something but produces no result. So that's an Action. Suppose the task requires that it consumes an object to do its work; then that's an Action<object>.
Make sure you understand the relationships here. They are a bit tricky but they all make sense. The names are carefully chosen.
So what then is a thread? A thread is a worker that can do tasks. Do not confuse tasks with threads.
having read about Task I cannot see any reason why it would work more effectively than Parallel.
You see what I mean I hope. This sentence makes no sense. Tasks are just that: tasks. Deliver this book to this address. Add these numbers. Mow this lawn. Tasks are not workers, and they are certainly not the concept of "hire a bunch of workers to do tasks". Parallelism is a strategy for assigning workers to tasks.
Moreover, do not fall into the trap of believing that tasks are inherently parallel. There is no requirement that tasks be performed simultaneously by multiple workers; much of the work we've done in C# in the past few years has been to ensure that tasks may be performed efficiently by a single worker. If you need to make breakfast, mow the lawn and pick up the mail, you don't need to hire a staff to do those things, but you can still pick up the mail while the toast is toasting.
You should examine carefully your claim that the best way to increase performance is to parallelize. Remember, parallelization is simply hiring as many workers as there are CPU cores to run them, and then handing out tasks to each. This is only an improvement if (1) the tasks can actually be run in parallel, independently, (2) the tasks are gated on CPU, not I/O, and (3) you can write programs that are correct in the face of multiple threads of execution in the same program.
If your tasks really are "embarrassingly parallel" and can run completely independently of each other then you might consider process parallelism rather than thread parallelism. It's safer and easier.

The errors are arising in a third-party tool the call is coding. This tool is supposed to be thread safe, but it looks strongly as though they're arising when two threads try and perform the same operation at the same time.
If that's correct, then parallel tasks won't prevent errors any more than Parallel will.
But when I examined the called code, I discovered my predecessor had threaded it using Task and Action, operators with which I'm not familiar.
That code looks OK, though it does use the task constructor combined with Start, which would be more elegantly expressed with Task.Run.
The prototype is using a dynamic task-based parallelism approach, which is overkill for this situation. Your code is using parallel loops, which is more appropriate for data parallelism (see Selecting the Right Pattern and Figure 1).
Allegedly his prototype did not error and ran at a greater speed than mine.
If the error is due to a multithreading but in the third-party tool, then the prototype was just as susceptible to those errors. Perhaps it was using an earlier version of the tool, or the data in the dev database did not expose the bug, or it just got lucky.
Regarding performance, I would expect Parallel to have superior performance to plain task parallelism in general, because Parallel can "batch" operations among tasks, reducing the overhead. Though that extra logic does come with a cost, too, so for small data sizes it could be less performant.
IMO the bigger question is the correctness, and if it fails with Parallel, then it could just as easily fail with parallel tasks.

On the surface, the difference between a task and a thread is this:
A thread is one of the ways you can involve the operating system and the processor in how to have the computer do more than one thing at a time, by having something that is scheduled on the processor and allow it to execute, potentially (and these days, most often) at the same time that other things execute, simply because the processors of today can do more than one thing at the same time
A task, in the context of Task or Task<T>, on the other hand, is the representation of something that has the potential of completing at some point in the future, and then represent the result of that completion
That's basically it.
Sure, you can wrap a thread in a task, but if your question is just "what is the difference between a thread and a task" then the above is it.
You can easily represent things that have nothing to do with threads or even parallel execution of code in a task and it would still be a task. Asynchronous I/O uses tasks heavily these days and most of those (at least the good implementations) doesn't use (extra) threads at all.

processor usage (force full usage)

First of all, I did not study computer science, and I teached programming my self, said that;
I have a C# program that runs heavy power flow simulations for very large demand profiles.
I use a laptop with an intel i7 processor (4 cores -> 8 threads) under windows 7.
When I run the simulations the processor ussage is arround 32%.
I have read other threads about process prority, and I have more or less clear that when something runs on the OS, it runs at full speed, but the OS keeps the interfaces responsive (is this correct?)
Well I want to "completely flood the processor" with the simulation; get a 100% of usage (if possible) ?
Thanks in advance.
Ref#1: Is there a way of restricting an API's processor resource in c#?
Ref#2: Multiple Processors and PerformanceCounter C#
EDIT: piece of code that calls the simulations after removing the non relevant stuff
while ( current_step < sim_times.Count ) {
bool repeat_all = false;
power_flow( sim_times[current_step] );
current_step++;
}
I know it is super simple, and it is a while becausein the original code I may want to repeat a certain number of steps.
The power_flow() function calls a third party software, so I guess is this third party software the one that should do the multy threading, isn't it?

You can't really force full usage - you need to provide more work for the processor to do. You could do this by increasing the number of threads to process more data in parallel. If you provide your samples of your source code we could provide specific advice on how you could alter your code to achieve this.
If you are using a third party piece of software for data processing, this often makes it difficult to split into multiple threads. One tactic that's often helpful is to split up your input data, then start a new thread for each data set. This requires domain specific knowledge to know what you can split up. For simulations, once you have split up one run as much as possible, an alternative is to process multiple runs in parallel.
The Task Parallel Library can be really useful to break down your code into multiple threads without much refactoring. Particularly the data parallelism section.
One big note of caution - you need to make sure what you're doing is thread-safe. I'll provide some further reading for you. The basic principal is that you need to made sure if you're sharing data between threads then you need to be very careful they don't affect one another - this can cause bizarre problems!
As for your question regarding interfaces - within your process you can allocate thread priority to each thread. An interface thread is just a thread like any other. Usually a UI thread is given the highest priority to remain responsive, whereas a long background process is given a normal/below normal priority as it can be processed during idle time. You can set the priority manually, the default for any new thread is Normal.

You should process these simulations in parallel so that you use as many CPUs as possible. Do this by creating a Task for each simulation run.
using System.Threading.Tasks;
...
List<Task> tasks = new List<Task>();
for(;current_step < sim_times.Count; current_step++)
{
var simTime = sim_times[current_step]; //extract the sim parameter
Task.Factory.StartNew(() => power_flow(simTime)); //create a 'hot' task - one that is immediately scheduled for execution
}
Task.WaitAll(tasks.ToArray()); //wait for the simulations to finish so that you can process results.
Data Parallelism (Task Parallel Library)

XNA only using 1 core for 4 ThreadPool tasks?

I'm creating a physics simulation, which is meant to be realistic, and I have it working correctly, but the framerate drops off quite quickly.
I'm iterating through each of the objects, and then again for each of those objects.
I'm not sure why this would be the case, since the number of operations in each frame remains the same. The only thing of which I can think, is that the threading is the problem. I have the iteration split into four parts, and I have one quarter of the list being calculated on 4 separate threadsw, but when I check task manager, I'm only really using one core.
Here is the pertinent code:
private void Update(GameTime gameTime)
{
for (int i = 0; i < Bodies.Count; i++)
{
Bodies[i].Update(gameTime);
}
ThreadPool.QueueUserWorkItem(new WaitCallback(CalculatePhysics0));
ThreadPool.QueueUserWorkItem(new WaitCallback(CalculatePhysics1));
ThreadPool.QueueUserWorkItem(new WaitCallback(CalculatePhysics2));
ThreadPool.QueueUserWorkItem(new WaitCallback(CalculatePhysics3));
}
private void CalculatePhysics0(object o)
{
for (int i = 0; i < Bodies.Count/4; i++)
{
Body body = Bodies[i];
g.ApplyTo(ref body, Bodies);
}
}
// 3 other methods exactly the same, but iterating their portion of the list
I'm not very experienced with multithreading. I can deal with the problems that arise from its use, though. I can see that the problem may be that the ThreadPool is not a good way to achieve my desired effect, which is to iterate through the list concurrently between threads.

You can use the Task Parallel Library, which helps make most of this much easier. This should be available in XNA 4.0+
Task.Factory.StartNew(()=>CalculatePhysics0);
I believe the default behavior should work, but you can specify the TaskCreationOption
If that does not work, then you can use a TaskScheduler:
var scheduler = new TaskScheduler{MaximumConcurrencyLevel=4};
Task.Factory.StartNew(()=>CalculatePhysics0, null, TaskCreationOptions.None, scheduler);

In regards to a comment you've left on another answer: more threads != performance increase. You may be surprised how many things actually perform better in serial rather than in parallel. Of course, that doesn't mean that your code can't benefit from multiple threads on different cores, just that you shouldn't assume adding more threads will magically cause your processing throughput to increase.
It's difficult to say what will help your code perform better, without being able to look at more of it. First, it's helpful to know a little bit about the threading model in .NET. I would recommend you read about it, since I can't go into much detail here. A thread in .NET is not necessarily a native thread. That is to say that by the time you queue the third physics method in the ThreadPool, the first might be done, and so it will just use the thread already created. Of course, then there's the time where you queue a task right before another one finishes and an additional (costly) native thread has to be created. In that case, it's possible that less threads might have been better.
The idea of abstraction goes further when you look at the Task Parallel Library, where each task may seem like a thread, but really is much further from it. Many tasks can wind up running on the same thread. You can get around this by hinting to the TaskFactory that it's a long running task, ie Task.Factory.StartNew(() => DoWork(), TaskCreationOptions.LongRunning), which causes the scheduler to start this task on a new thread (this, again, might not be better than just having the runtime schedule it for you).
All this being said, if your work takes enough time to process, it will wind up running on a separate thread if you're queuing it with the ThreadPool. If the work happens fast enough, though, it would appear that only one thread or one core is being used.
What led you to the conclusion that that you're only using one core?
Was it only what you saw from Task Manager? That's hardly a conclusive result.
Did you try adding a column to the Task Manager detail tab for threads, and actually check if your code is spawning additional threads or not?
I also see that you're iterating over the array of Bodies twice, is there any particular reason you can't update the bodies with the GameTime in parallel as well (perhaps some limitation in XNA)?
All of the above is just a shot in the dark though. If you really, and I mean really want to know where any performance issues lie, you'll profile your code with a decent profiler, like Ants by RedGate, dotTrace by JetBrains, or if you have the Premium or above edition of Visual Studio, the analyzer built right into your IDE.
I'm not sure your problem is where you think it is, and in my experience, it rarely is. I hope that some of my brain dump above can be of some help to you.

Best way to limit the number of active Tasks running via the Parallel Task Library

Consider a queue holding a lot of jobs that need processing. Limitation of queue is can only get 1 job at a time and no way of knowing how many jobs there are. The jobs take 10s to complete and involve a lot of waiting for responses from web services so is not CPU bound.
If I use something like this
while (true)
{
var job = Queue.PopJob();
if (job == null)
break;
Task.Factory.StartNew(job.Execute);
}
Then it will furiously pop jobs from the queue much faster than it can complete them, run out of memory and fall on its ass. >.<
I can't use (I don't think) ParallelOptions.MaxDegreeOfParallelism because I can't use Parallel.Invoke or Parallel.ForEach
3 alternatives I've found
Replace Task.Factory.StartNew with
Task task = new Task(job.Execute,TaskCreationOptions.LongRunning)
task.Start();
Which seems to somewhat solve the problem but I am not clear exactly what this is doing and if this is the best method.
Create a custom task scheduler that limits the degree of concurrency
Use something like BlockingCollection to add jobs to collection when started and remove when finished to limit number that can be running.
With #1 I've got to trust that the right decision is automatically made, #2/#3 I've got to work out the max number of tasks that can be running myself.
Have I understood this correctly - which is the better way, or is there another way?
EDIT - This is what I've come up with from the answers below, producer-consumer pattern.
As well as overall throughput aim was not to dequeue jobs faster than could be processed and not have multiple threads polling queue (not shown here but thats a non-blocking op and will lead to huge transaction costs if polled at high frequency from multiple places).
// BlockingCollection<>(1) will block if try to add more than 1 job to queue (no
// point in being greedy!), or is empty on take.
var BlockingCollection<Job> jobs = new BlockingCollection<Job>(1);
// Setup a number of consumer threads.
// Determine MAX_CONSUMER_THREADS empirically, if 4 core CPU and 50% of time
// in job is blocked waiting IO then likely be 8.
for(int numConsumers = 0; numConsumers < MAX_CONSUMER_THREADS; numConsumers++)
{
Thread consumer = new Thread(() =>
{
while (!jobs.IsCompleted)
{
var job = jobs.Take();
job.Execute();
}
}
consumer.Start();
}
// Producer to take items of queue and put in blocking collection ready for processing
while (true)
{
var job = Queue.PopJob();
if (job != null)
jobs.Add(job);
else
{
jobs.CompletedAdding()
// May need to wait for running jobs to finish
break;
}
}

I just gave an answer which is very applicable to this question.
Basically, the TPL Task class is made to schedule CPU-bound work. It is not made for blocking work.
You are working with a resource that is not CPU: waiting for service replies. This means the TPL will mismange your resource because it assumes CPU boundedness to a certain degree.
Manage the resources yourself: Start a fixed number of threads or LongRunning tasks (which is basically the same). Decide on the number of threads empirically.
You can't put unreliable systems into production. For that reason, I recommend #1 but throttled. Don't create as many threads as there are work items. Create as many threads which are needed to saturate the remote service. Write yourself a helper function which spawns N threads and uses them to process M work items. You get totally predictable and reliable results that way.

Potential flow splits and continuations caused by await, later on in your code or in a 3rd party library, won't play nicely with long running tasks (or threads), so don't bother using long running tasks. In the async/await world, they're useless. More details here.
You can call ThreadPool.SetMaxThreads but before you make this call, make sure you set the minimum number of threads with ThreadPool.SetMinThreads, using values below or equal to the max ones. And by the way, the MSDN documentation is wrong. You CAN go below the number of cores on your machine with those method calls, at least in .NET 4.5 and 4.6 where I used this technique to reduce the processing power of a memory limited 32 bit service.
If however you don't wish to restrict the whole app but just the processing part of it, a custom task scheduler will do the job. A long time ago, MS released samples with several custom task schedulers, including a LimitedConcurrencyLevelTaskScheduler. Spawn the main processing task manually with Task.Factory.StartNew, providing the custom task scheduler, and every other task spawned by it will use it, including async/await and even Task.Yield, used for achieving asynchronousy early on in an async method.
But for your particular case, both solutions won't stop exhausting your queue of jobs before completing them. That might not be desirable, depending on the implementation and purpose of that queue of yours. They are more like "fire a bunch of tasks and let the scheduler find the time to execute them" type of solutions. So perhaps something a bit more appropriate here could be a stricter method of control over the execution of the jobs via semaphores. The code would look like this:
semaphore = new SemaphoreSlim(max_concurrent_jobs);
while(...){
job = Queue.PopJob();
semaphore.Wait();
ProcessJobAsync(job);
}
async Task ProcessJobAsync(Job job){
await Task.Yield();
... Process the job here...
semaphore.Release();
}
There's more than one way to skin a cat. Use what you believe is appropriate.

Microsoft has a very cool library called DataFlow which does exactly what you want (and much more). Details here.
You should use the ActionBlock class and set the MaxDegreeOfParallelism of the ExecutionDataflowBlockOptions object. ActionBlock plays nicely with async/await, so even when your external calls are awaited, no new jobs will begin processing.
ExecutionDataflowBlockOptions actionBlockOptions = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
};
this.sendToAzureActionBlock = new ActionBlock<List<Item>>(async items => await ProcessItems(items),
actionBlockOptions);
...
this.sendToAzureActionBlock.Post(itemsToProcess)

The problem here doesn't seem to be too many running Tasks, it's too many scheduled Tasks. Your code will try to schedule as many Tasks as it can, no matter how fast they are executed. And if you have too many jobs, this means you will get OOM.
Because of this, none of your proposed solutions will actually solve your problem. If it seems that simply specifying LongRunning solves your problem, then that's most likely because creating a new Thread (which is what LongRunning does) takes some time, which effectively throttles getting new jobs. So, this solution only works by accident, and will most likely lead to other problems later on.
Regarding the solution, I mostly agree with usr: the simplest solution that works reasonably well is to create a fixed number of LongRunning tasks and have one loop that calls Queue.PopJob() (protected by a lock if that method is not thread-safe) and Execute()s the job.
UPDATE: After some more thinking, I realized the following attempt will most likely behave terribly. Use it only if you're really sure it will work well for you.
But the TPL tries to figure out the best degree of parallelism, even for IO-bound Tasks. So, you might try to use that to your advantage. Long Tasks won't work here, because from the point of view of TPL, it seems like no work is done and it will start new Tasks over and over. What you can do instead is to start a new Task at the end of each Task. This way, TPL will know what's going on and its algorithm may work well. Also, to let the TPL decide the degree of parallelism, at the start of a Task that is first in its line, start another line of Tasks.
This algorithm may work well. But it's also possible that the TPL will make a bad decision regarding the degree of parallelism, I haven't actually tried anything like this.
In code, it would look like this:
void ProcessJobs(bool isFirst)
{
var job = Queue.PopJob(); // assumes PopJob() is thread-safe
if (job == null)
return;
if (isFirst)
Task.Factory.StartNew(() => ProcessJobs(true));
job.Execute();
Task.Factory.StartNew(() => ProcessJob(false));
}
And start it with
Task.Factory.StartNew(() => ProcessJobs(true));

TaskCreationOptions.LongRunning is useful for blocking tasks and using it here is legitimate. What it does is it suggests to the scheduler to dedicate a thread to the task. The scheduler itself tries to keep number of threads on same level as number of CPU cores to avoid excessive context switching.
It is well described in Threading in C# by Joseph Albahari

I use a message queue/mailbox mechanism to achieve this. It's akin to the actor model. I have a class that has a MailBox. I call this class my "worker." It can receive messages. Those messages are queued and they, essentially, define tasks that I want the worker to run. The worker will use Task.Wait() for its Task to finish before dequeueing the next message and starting the next task.
By limiting the number of workers I have, I am able to limit the number of concurrent threads/tasks that are being run.
This is outlined, with source code, in my blog post on a distributed compute engine. If you look at the code for IActor and the WorkerNode, I hope it makes sense.
https://long2know.com/2016/08/creating-a-distributed-computing-engine-with-the-actor-model-and-net-core/

Design considerations for an adaptive thread pool in Java

I would like to implement a thread pool in Java, which can dynamically resize itself based on the computational and I/O behavior of the tasks submitted to it.
Practically, I want to achieve the same behavior as the new Thread Pool implementation in C# 4.0
Is there an implementation already or can I achieve this behavior by using mostly existing concurrency utilities (e.g. CachedThreadPool)?
The C# version does self instrumentation to achieve an optimal utilization. What self instrumentation is available in Java and what performance implications do the present?
Is it feasible to do a cooperative approach, where the task signals its intent (e.g. entering I/O intensive operation, entering CPU intensive operation phase)?
Any suggestions are welcome.
Edit Based on comments:
The target scenarios could be:
Local file crawling and processing
Web crawling
Multi-webservice access and aggregation
The problem of the CachedThreadPool is that it starts new threads when all existing threads are blocked - you need to set explicit bounds on it, but that's it.
For example, I have 100 web services to access in a row. If I create a 100 CTP, it will start 100 threads to perform the operation, and the ton of multiple I/O requests and data transfer will surely stumble upon each others feet. For a static test case I would be able to experiment and find out the optimal pool size, but I want it to be adaptively determined and applied in a way.

Consider creating a Map where the key is the bottleneck resource.
Each thread submitted to the pool will submit a resource which is it's bottleneck, ie "CPU", "Network", "C:\" etc.
You could start by allowing only one thread per resource and then maybe slowly ramp up until work completion rate stops increasing. Things like CPU could have a floor of the core count.

Let me present an alternative approach. Having a single thread pool is a nice abstraction, but it's not very performant, especially when the jobs are very IO-bound - then there's no good way to tune it, it's tempting to blow up the pool size to maximize IO throughput but you suffer from too many thread switches, etc.
Instead I'd suggest looking at the architecture of the Apache MINA framework for inspiration. (http://mina.apache.org/) It's a high-performance web framework - they describe it as a server framework, but I think their architecture works well for inverse scenarios as well, like spidering and multi-server clients. (Actually, you might even be able to use it out-of-the-box for your project.)
They use the Java NIO (non-blocking I/O) libraries for all IO operations, and divide up the work into two thread pools: a small and fast set of socket threads, and a larger and slower set of business logic threads. So the layers look as follows:
On the network end, a large set of NIO channels, each with a message buffer
A small pool of socket threads, which go through the channel list round-robin. Their only job is to check the socket, and move any data out into the message buffer - and if the message is done, close it out and transfer to the job queue. These guys are fast, because they just push bits around, and skip any sockets that are blocked on IO.
A single job queue that serializes all messages
A large pool of processing threads, which pull messages off the queue, parse them, and do whatever processing is required.
This makes for very good performance - IO is separated out into its own layer, and you can tune the socket thread pool to maximize IO throughput, and separately tune the processing thread pool to control CPU/resource utilization.

The example given is
Result[] a = new Result[N];
for(int i=0;i<N;i++) {
a[i] = compute(i);
}
In Java the way to paralellize this to every free core and have the work load distributed dynamically so it doesn't matter if one task takes longer than another.
// defined earlier
int procs = Runtime.getRuntime().availableProcessors();
ExecutorService service = Executors.newFixedThreadPool(proc);
// main loop.
Future<Result>[] f = new Future<Result>[N];
for(int i = 0; i < N; i++) {
final int i2 = i;
a[i] = service.submit(new Callable<Result>() {
public Result call() {
return compute(i2);
}
}
}
Result[] a = new Result[N];
for(int i = 0; i < N; i++)
a[i] = f[i].get();
This hasn't changed much in the last 5 years, so its not as cool as it was when it was first available. What Java really lacks is closures. You can use Groovy instead if that is really a problem.
Additional: If you cared about performance, rather than as an example, you would calculate Fibonacci in parallel because its a good example of a function which is faster if you calculate it single threaded.
One difference is that each thread pool only has one queue, so there is no need to steal work. This potentially means that you have more overhead per task. However, as long as your tasks typically take more than about 10 micro-seconds it won't matter.

I think you should monitor CPU utilization, in a platform-specific manner. Find out how many CPUs/cores you have, and monitor the load. When you find that the load is low, and you still have more work, create new threads - but not more than x times num-cpus (say, x=2).
If you really want to consider IO threads also, try to find out what state each thread is in when your pool is exhausted, and deduct all waiting threads from the total number. One risk is that you exhaust memory by admitting too many tasks, though.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.