processor usage (force full usage)

processor usage (force full usage) - c#

First of all, I did not study computer science, and I teached programming my self, said that;
I have a C# program that runs heavy power flow simulations for very large demand profiles.
I use a laptop with an intel i7 processor (4 cores -> 8 threads) under windows 7.
When I run the simulations the processor ussage is arround 32%.
I have read other threads about process prority, and I have more or less clear that when something runs on the OS, it runs at full speed, but the OS keeps the interfaces responsive (is this correct?)
Well I want to "completely flood the processor" with the simulation; get a 100% of usage (if possible) ?
Thanks in advance.
Ref#1: Is there a way of restricting an API's processor resource in c#?
Ref#2: Multiple Processors and PerformanceCounter C#
EDIT: piece of code that calls the simulations after removing the non relevant stuff
while ( current_step < sim_times.Count ) {
bool repeat_all = false;
power_flow( sim_times[current_step] );
current_step++;
}
I know it is super simple, and it is a while becausein the original code I may want to repeat a certain number of steps.
The power_flow() function calls a third party software, so I guess is this third party software the one that should do the multy threading, isn't it?

You can't really force full usage - you need to provide more work for the processor to do. You could do this by increasing the number of threads to process more data in parallel. If you provide your samples of your source code we could provide specific advice on how you could alter your code to achieve this.
If you are using a third party piece of software for data processing, this often makes it difficult to split into multiple threads. One tactic that's often helpful is to split up your input data, then start a new thread for each data set. This requires domain specific knowledge to know what you can split up. For simulations, once you have split up one run as much as possible, an alternative is to process multiple runs in parallel.
The Task Parallel Library can be really useful to break down your code into multiple threads without much refactoring. Particularly the data parallelism section.
One big note of caution - you need to make sure what you're doing is thread-safe. I'll provide some further reading for you. The basic principal is that you need to made sure if you're sharing data between threads then you need to be very careful they don't affect one another - this can cause bizarre problems!
As for your question regarding interfaces - within your process you can allocate thread priority to each thread. An interface thread is just a thread like any other. Usually a UI thread is given the highest priority to remain responsive, whereas a long background process is given a normal/below normal priority as it can be processed during idle time. You can set the priority manually, the default for any new thread is Normal.

You should process these simulations in parallel so that you use as many CPUs as possible. Do this by creating a Task for each simulation run.
using System.Threading.Tasks;
...
List<Task> tasks = new List<Task>();
for(;current_step < sim_times.Count; current_step++)
{
var simTime = sim_times[current_step]; //extract the sim parameter
Task.Factory.StartNew(() => power_flow(simTime)); //create a 'hot' task - one that is immediately scheduled for execution
}
Task.WaitAll(tasks.ToArray()); //wait for the simulations to finish so that you can process results.
Data Parallelism (Task Parallel Library)

Related

What is a multithreading program and how does it work?

What is a multithreading program and how does it work exactly? I read some documents but I'm confused. I know that code is executed line by line, but I can't understand how the program manages this.
A simple answer would be appreciated.c# example please (only animation!)

What is a multi-threading program and how does it work exactly?
Interesting part about this question is complete books are written on the topic, but still it is elusive to lot of people. I will try to explain in the order detailed underneath.
Please note this is just to provide a gist, an answer like this can never do justice to the depth and detail required. Regarding videos, best that I have come across are part of paid subscriptions (Wintellect and Pluralsight), check out if you can listen to them on trial basis, assuming you don't already have the subscription:
Wintellect by Jeffery Ritcher (from his Book, CLR via C#, has same chapter on Thread Fundamentals)
CLR Threading by Mike Woodring
Explanation Order
What is a thread ?
Why were threads introduced, main purpose ?
Pitfalls and how to avoid them, using Synchronization constructs ?
Thread Vs ThreadPool ?
Evolution of Multi threaded programming API, like Parallel API, Task API
Concurrent Collections, usage ?
Async-Await, thread but no thread, why they are best for IO
What is a thread ?
It is software implementation, which is purely a Windows OS concept (multi-threaded architecture), it is bare minimum unit of work. Every process on windows OS has at least one thread, every method call is done on the thread. Each process can have multiple threads, to do multiple things in parallel (provided hardware support).
Other Unix based OS are multi process architecture, in fact in Windows, even the most complex piece of software like Oracle.exe have single process with multiple threads for different critical background operations.
Why were threads introduced, main purpose ?
Contrary to the perception that concurrency is the main purpose, it was robustness that lead to the introduction of threads, imagine every process on Windows is running using same thread (in the initial 16 bit version) and out of them one process crash, that simply means system restart to recover in most of the cases. Usage of threads for concurrent operations, as multiple of them can be invoked in each process, came in picture down the line. In fact it is even important to utilize the processor with multiple cores to its full ability.
Pitfalls and how to avoid using Synchronization constructs ?
More threads means, more work completed concurrently, but issue comes, when same memory is accessed, especially for Write, as that's when it can lead to:
Memory corruption
Race condition
Also, another issue is thread is a very costly resource, each thread has a thread environment block, Kernel memory allocation. Also for scheduling each thread on a processor core, time is spent for context switching. It is quite possible that misuse can cause huge performance penalty, instead of improvement.
To avoid Thread related corruption issues, its important to use the Synchronization constructs, like lock, mutex, semaphore, based on requirement. Read is always thread safe, but Write needs appropriate Synchronization.
Thread Vs ThreadPool ?
Real threads are not the ones, we use in C#.Net, that's just the managed wrapper to invoke Win32 threads. Challenge remain in user's ability to grossly misuse, like invoking lot more than required number of threads, assigning the processor affinity, so isn't it better that we request a standard pool to queue the work item and its windows which decide when the new thread is required, when an already existing thread can schedule the work item. Thread is a costly resource, which needs to be optimized in usage, else it can be bane not boon.
Evolution of Multi threaded programming, like Parallel API, Task API
From .Net 4.0 onward, variety of new APIs Parallel.For, Parallel.ForEach for data paralellization and Task Parallelization, have made it very simple to introduce concurrency in the system. These APIs again work using a Thread pool internally. Task is more like scheduling a work for sometime in the future. Now introducing concurrency is like a breeze, though still synchronization constructs are required to avoid memory corruption, race condition or thread safe collections can be used.
Concurrent Collections, usage ?
Implementations like ConcurrentBag, ConcurrentQueue, ConcurrentDictionary, part of System.Collections.Concurrent are inherent thread safe, using spin-wait and much easier and quicker than explicit Synchronization. Also much easier to manage and work. There's another set API like ImmutableList System.Collections.Immutable, available via nuget, which are thread safe by virtue of creating another copy of data structure internally.
Async-Await, thread but no thread, why they are best for IO
This is an important aspect of concurrency meant for IO calls (disk, network), other APIs discussed till now, are meant for compute based concurrency so threads are important and make it faster, but for IO calls thread has no use except waiting for the call to return, IO calls are processed on hardware based queue IO Completion ports

A simple analogy might be found in the kitchen.
You've probably cooked using a recipe before -- start with the specified ingredients, follow the steps indicated in the recipe, and at the end you (hopefully) have a delicious dish ready to eat. If you do that, then you have executed a traditional (non-multithreaded) program.
But what if you have to cook a full meal, which includes a number of different dishes? The simple way to do it would be to start with the first recipe, do everything the recipe says, and when it's done, put the finished dish (and the first recipe) aside, then start on the second recipe, do everything it says, put the second dish (and second recipe) aside, and so on until you've gone through all of the recipes one after another. That will work, but you might end up spending 10 hours in the kitchen, and of course by the time the last dish is ready to eat, the first dish might be cold and unappetizing.
So instead you'd probably do what most chefs do, which is to start working on several recipes at the same time. For example, you might put the roast in the oven for 45 minutes, but instead of sitting in front of the oven waiting 45 minutes for the roast to cook, you'd spend the 45 minutes chopping the vegetables. When the oven timer rings, you put down your vegetable knife, pull the cooked roast out of the oven and let it cool, then go back to chopping vegetables, and so on. If you can do that, then you are successfully multitasking several recipes/programs. That is, you aren't literally working on multiple recipes at once (you still have only two hands!), but you are jumping back and forth from following one recipe to following another whenever necessary, and thereby making progress on several tasks rather than twiddling your thumbs a lot. Do this well and you can have the whole meal ready to eat in a much shorter amount of time, and everything will be hot and fresh at about the same time too. If you do this, you are executing a simple multithreaded program.
Then if you wanted to get really fancy, you might hire a few other chefs to work in the kitchen at the same time as you, so that you can get even more food prepared in a given amount of time. If you do this, your team is doing multiprocessing, with each chef taking one part of the total work and all of them working simultaneously. Note that each chef may well be working on multiple recipes (i.e. multitasking) as described in the previous paragraph.
As for how a computer does this sort of thing (no more analogies about chefs), it usually implements it using a list of ready-to-run threads and a timer. When the timer goes off (or when the thread that is currently executing has nothing to do for a while, because e.g. it is waiting to load data from a slow hard drive or something), the operating system does a context switch, in which pauses the current thread (by putting it into a list somewhere and no longer executing instructions from that thread's code anymore), then pulls another ready-to-run thread from the list of ready-to-run threads and starts executing instructions from that thread's code instead. This repeats for as long as necessary, often with context switches happening every few milliseconds, giving the illusion that multiple programs are running "at the same time" even on a single-core CPU. (On a multi-core CPU it does this same thing on each core, and in that case it's no longer just an illusion; multiple programs really are running at the same time)

Why don't you refer to Microsoft's very own documentation of the .net class System.Threading.Thread?
It has a handfull of simple example programs written in C# (at the bottom of the page) just as you asked for:
Thread Examples

actually multi thread is do multiple process at the same time together . and you can complete process parallel .

it's actually multi thread is do multiple process at the same time together . and you can complete process parallel . you can take task from your main thread then execute some other way and done .

System.Threading.ThreadPool excluding a core?

I have a multi-core architecture computer that is executing processes using .Net 4.5.2 System.Threading.ThreadPool namespace. The processes are long duration math computations. These processes might execute for days and even weeks. I do not want to create my own Thread Pools. Using the System.Threading.ThreadPool namespace is very nice. However, on my multi-core architecture computer, the Thread Pool Manager is very greedy and load balances across all of the cores. Unfortunately, my processes on each core are also greedy. They want to monopolize the core and execute at 100% until it completes its assignment. I'm fine with this, except that the operating system freezes up. Literally, I can't move the mouse around and interact with the desktop. What I would like to do is reserve one of the Cores for the Operating System, so that the mouse and gui are still responsive. It seems logical to exclude a core (and its available threads) for the OS to operate.
Does anyone know how to accomplish this using System.Threading.ThreadPool?
****ANSWER****
To begin, my question is faulty. This was due to my inexperience with the subject. Second, if your google search brought you to this question, it means that your thinking is also faulty; evidenced by your google search words. But this is good. Now you can learn the proper way. And here it is.
The short answer to this question: System.Threading.ThreadPool cannot solve your issue.
A slightly better answer: The Task Parallel Library (TPL) is a set of public types and APIs in the System.Threading and System.Threading.Tasks namespaces in the .NET Framework 4.0. The TPL scales the degree of concurrency dynamically to efficiently use all the cores that are available. By using TPL, you can maximize the performance of your code while focusing on the work that your program is designed to accomplish.
Good luck and happy coding to you!

#ScottChamberlain: because the math operation relies on extracting a value from an external network resource. That resource may be blocked for reasons I can't control (perhaps another user is accessing it, blocking until they release it). – sapbucket
What you need to do is separate the getting data from the network resource from the processing of that resoruce. Use a BlockingCollection as a buffer to pipeline between your two parts.
BlockingCollection<YourData> _collection = new BlockingCollection<YourData>();
public void ProcessInParallel()
{
//This starts up your workers, it will create the number of cores you have - 1 workers.
var tasks = new List<Task>();
for(int i = 0; i < Environment.NumberOfLogicalCores - 1; i++)
{
var task = Task.Factory.StartNew(DataProcessorLoop, TaskCreationOptions.LongRunning);
tasks.Add(task);
}
//This function could be done in parallel too, _collection.Add is fine with multiple threads calling it.
foreach(YourData data in GetYourDataFromTheNetwork())
{
//Put data in to the collection, the first worker available will take it.
_collection.Add(data);
}
//Let the consumers know you will not be adding any more data.
_collection.CompleteAdding();
//Wait for all of the worker tasks to drain the collection and finish.
Task.WaitAll(tasks);
}
private void DataProcessorLoop()
{
//this will pull data from the collection of work to do, when there is no work to do
//it will block until more work shows up or CompleteAdding is called.
foreach(YourData data in _collection.GetConsumingEnumerable())
{
CrunchData(data)
}
}

Launching multiple tasks from a WCF service

I need to optimize a WCF service... it's quite a complex thing. My problem this time has to do with tasks (Task Parallel Library, .NET 4.0). What happens is that I launch several tasks when the service is invoked (using Task.Factory.StartNew) and then wait for them to finish:
Task.WaitAll(task1, task2, task3, task4, task5, task6);
Ok... what I see, and don't like, is that on the first call (sometimes the first 2-3 calls, if made quickly one after another), the final task starts much later than the others (I am looking at a case where it started 0.5 seconds after the others). I tried calling
ThreadPool.SetMinThreads(12*Environment.ProcessorCount, 20);
at the beginning of my service, but it doesn't seem to help.
The tasks are all database-related: I'm reading from multiple databases and it has to take as little time as possible.
Any idea why the last task is taking so long? Is there something I can do about it?
Alternatively, should I use the thread pool directly? As it happens, in one case I'm looking at, one task had already ended before the last one started - I would had saved 0.2 seconds if I had reused that thread instead of waiting for a new one to be created. However, I can not be sure that that task will always end so quickly, so I can't put both requests in the same task.
[Edit] The OS is Windows Server 2003, so there should be no connection limit. Also, it is hosted in IIS - I don't know if I should create regular threads or using the thread pool - which is the preferred version?
[Edit] I've also tried using Task.Factory.StartNew(action, TaskCreationOptions.LongRunning); - it doesn't help, the last task still starts much later (around half a second later) than the rest.
[Edit] MSDN1 says:
The thread pool has a built-in delay
(half a second in the .NET Framework
version 2.0) before starting new idle
threads. If your application
periodically starts many tasks in a
short time, a small increase in the
number of idle threads can produce a
significant increase in throughput.
Setting the number of idle threads too
high consumes system resources
needlessly.
However, as I said, I'm already calling SetMinThreads and it doesn't help.

I have had problems myself with delays in thread startup when using the (.Net 4.0) Task-object. So for time-critical stuff I now use dedicated threads (... again, as that is what I was doing before .Net 4.0.)
The purpose of a thread pool is to avoid the operative system cost of starting and stopping threads. The threads are simply being reused. This is a common model found in for example internet servers. The advantage is that they can respond quicker.
I've written many applications where I implement my own threadpool by having dedicated threads picking up tasks from a task queue. Note however that this most often required locking that can cause delays/bottlenecks. This depends on your design; are the tasks small then there would be a lot of locking and it might be faster to trade some CPU in for less locking: http://www.boyet.com/Articles/LockfreeStack.html
SmartThreadPool is a replacement/extension of the .Net thread pool. As you can see in this link it has a nice GUI to do some testing: http://www.codeproject.com/KB/threads/smartthreadpool.aspx
In the end it depends on what you need, but for high performance I recommend implementing your own thread pool. If you experience a lot of thread idling then it could be beneficial to increase the number of threads (beyond the recommended cpucount*2). This is actually how HyperThreading works inside the CPU - using "idle" time while doing operations to do other operations.
Note that .Net has a built-in limit of 25 threads per process (ie. for all WCF-calls you receive simultaneously). This limit is independent and overrides the ThreadPool setting. It can be increased, but it requires some magic: http://www.csharpfriends.com/Articles/getArticle.aspx?articleID=201

Following from my prior question (yep, should have been a Q against original message - apologies):
Why do you feel that creating 12 threads for each processor core in your machine will in some way speed-up your server's ability to create worker threads? All you're doing is slowing your server down!
As per MSDN do
As per the MSDN docs: "You can use the SetMinThreads method to increase the minimum number of threads. However, unnecessarily increasing these values can cause performance problems. If too many tasks start at the same time, all of them might appear to be slow. In most cases, the thread pool will perform better with its own algorith for allocating threads. Reducing the minimum to less than the number of processors can also hurt performance.".
Issues like this are usually caused by bumping into limits or contention on a shared resource.
In your case, I am guessing that your last task(s) is/are blocking while they wait for a connection to the DB server to come available or for the DB to respond. Remember - if your invocation kicks off 5-6 other tasks then your machine is going to have to create and open numerous DB connections and is going to kick the DB with, potentially, a lot of work. If your WCF server and/or your DB server are cold, then your first few invocations are going to be slower until the machine's caches etc., are populated.
Have you tried adding a little tracing/logging using the stopwatch to time how long it takes for your tasks to connect to the DB server and then execute their operations?
You may find that reducing the number of concurrent tasks you kick off actually speeds things up. Try spawning 3 tasks at a time, waiting for them to complete and then spawn the next 3.

When you call Task.Factory.StartNew, it uses a TaskScheduler to map those tasks into actual work items.
In your case, it sounds like one of your Tasks is delaying occasionally while the OS spins up a new Thread for the work item. You could, potentially, build a custom TaskScheduler which already contained six threads in a wait state, and explicitly used them for these six tasks. This would allow you to have complete control over how those initial tasks were created and started.
That being said, I suspect there is something else at play here... You mentioned that using TaskCreationOptions.LongRunning demonstrates the same behavior. This suggests that there is some other factor at play causing this half second delay. The reason I suspect this is due to the nature of TaskCreationOptions.LongRunning - when using the default TaskScheduler (LongRunning is a hint used by the TaskScheduler class), starting a task with TaskCreationOptions.LongRunning actually creates an entirely new (non-ThreadPool) thread for that Task. If creating 6 tasks, all with TaskCreationOptions.LongRunning, demonstrates the same behavior, you've pretty much guaranteed that the problem is NOT the default TaskScheduler, since this is going to always spin up 6 threads manually.
I'd recommend running your code through a performance profiler, and potentially the Concurrency Visualizer in VS 2010. This should help you determine exactly what is causing the half second delay.

What is the OS? If you are not running the server versions of windows, there is a connection limit. Your many threads are probably being serialized because of the connection limit.
Also, I have not used the task parallel library yet, but my limited experience is that new threads are cheap to make in the context of networking.

These articles might explain the problem you're having:
http://blogs.msdn.com/b/wenlong/archive/2010/02/11/why-are-wcf-responses-slow-and-setminthreads-does-not-work.aspx
http://blogs.msdn.com/b/wenlong/archive/2010/02/11/why-does-wcf-become-slow-after-being-idle-for-15-seconds.aspx
seeing as you're using .Net 4, the first article probably doesn't apply, but as the second article points out the ThreadPool terminates idle threads after 15 seconds which might explain the problem you're having and offers a simple (though a little hacky) solution to get around it.
Whether or not you should be using the ThreadPool directly wouldn't make any difference as I suspect the task library is using it for you underneath anyway.
One third-party library we have been using for a while might help you here - Smart Thread Pool. You still get the same benefits of using the task libraries, in that you can have the return values from the threads and get any exception information from them too.
Also, you can instantiate threadpools so that when you have multiple places each needing a threadpool (so that a low priority process doesn't start eating into the quota of some high priority process) and oh yeah you can set the priority of the threads in the pool too which you can't do with the standard ThreadPool where all the threads are background threads.
You can find plenty of info on the codeplex page, I've also got a post which highlights some of the key differences:
http://theburningmonk.com/2010/03/threading-introducing-smartthreadpool/
Just on a side note, for tasks like the one you've mentioned, which might take some time to return, you probably shouldn't be using the threadpool anyway. It's recommended that we should avoid using the threadpool for any blocking tasks like that because it hogs up the threadpool which is used by all sorts of things by the framework classes, like handling timer events, etc. etc. (not to mention handling incoming WCF requests!). I feel like I'm spamming here but here's some of the info I've gathered around the use of the threadpool and some useful links at the bottom:
http://theburningmonk.com/2010/03/threading-using-the-threadpool-vs-creating-your-own-threads/
well, hope this helps!

Multithreading on a multi core machines not maxing CPU

I am working on maintaining someone else's code that is using multithreading, via two methods:
1: ThreadPool.QueueUserWorkItem(New WaitCallback(AddressOf ReadData), objUpdateItem)
2: Dim aThread As New Thread(AddressOf LoadCache)
aThread.Start()
However, on a dual core machine, I am only getting 50% CPU utlilization, and on a dual core with hyperthreadin enabled machine, I am only getting 25% CPU utilization.
Obviously threading is extremely complicated, but this behaviour would seem to indicate that I am not understanding some simple fundamental fact?
UPDATE
The code is too horribly complex to post here unfortunately, but for reference purposes, here is roughly what happens....I have approx 500 Accounts whose data is loaded from the database into an in memory cache...each account is loaded individually, and that process first calls a long running stored procedure, followed by manipulation and caching of the returned data. So, the point of threading in this situation is that there is indeed a bottleneck hitting the database (ie: the thread will be idled for up to 30 seconds waiting for the query to return), so we thread to allow others to begin processing the data they have received from Oracle.
So, the main thread executes:
ThreadPool.QueueUserWorkItem(New WaitCallback(AddressOf ReadData), objUpdateItem)
Then, the ReadData() then proceeds to execute (exactly once):
Dim aThread As New Thread(AddressOf LoadCache)
aThread.Start()
And this is occurring in a recursive function, so the QueueUserWorkItem can be executing multiple times, which in turn then executes exactly one new thread via the aThread.Start
Hopefully that gives a decent idea of how things are happening.
So, under this scenario, should this not theoretically pin both cores, rather than maxing out at 100% on one core, while the other core is essentially idle?

That code starts one thread that will go an do something. To get more than one core working you need to start more than one thread and get them both busy. Starting a thread to do some work, and then having your main thread wait for it won't get the task done any quicker. It is common to start a long running task on a background thread so that the UI remains responsive, which may be what this code was intended to do, but it won't make the task get done any quicker.
#Judah Himango - I had assumed that those two lines of code were samples of how multi-threading were being achieved in two different places in the program. Maybe the OP can clarify if this is the case or if these two lines really are in the one method. If they are part of one method then we will need to see what the two methods are actually doing.
Update:
That does sound like it should max out both cores. What do you mean by recursivly calling ReadData()? If each new thread is only calling ReadData at or near its end to start the next thread then that could explain the behaviour you are seeing.
I am not sure that this is actaully a good idea. If the stored proc takes 30 seconds to get the data then presumably it is placing a fair load on the database server. Running it 500 times in parallel is just going to make things worse. Obviously I don't know your database or data, but I would look at improving the performance of the stored proc.
If multi threading does look like the way forward, then I would have a loop on the main thread that calls ThreadPool.QueueUserWorkItem once for each account that needs loading. I would also remove the explicit thread creation and only use the thread pool. That way you are less likely to starve the local machine by creating too many threads.

How many threads are you spinning up? It may seem primitive (wait a few years, and you won't need to do this anymore), but your code has got to figure out an optimal number of threads to start, and spin up that many. Simply running a single thread won't make things any faster, and won't pin a physical processor, though it may be good for other reasons (a worker thread to keep your UI responsive, for instance).
In many cases, you'll want to be running a number of threads equal to the number of logical cores available to you (available from Environment.ProcessorCount, I believe), but it may have some other basis. I've spun up a few dozen threads, talking to different hosts, when I've been bound by remote process latency, for instance.

Multi-Threaded and Multi-Core are two different things. Doing things Multi-Threaded often won't offer you an enormous increase in performance, sometimes quite the opposite. The Operating System might do a few tricks to spread your cpu cycles over multiple cores, but that's where it ends.
What you are looking for is Parallelism. The .NET 4.0 framework will add a lot of new features to support Parallelism. Have a sneak-peak here:
http://www.danielmoth.com/Blog/2009/01/parallelising-loops-in-net-4.html

The CPU behavior would indicate that the application is only utilizing one logical processor. 50% would be one proc out of 2 (proc+proc). 25% would be one logical processor out of 4 (proc + HT + proc + HT)

How many threads to you have in total and do you have any locks in LoadCache. A SyncLock may a multi-thread system act as a single thread (by design). Also if your only spool one thread you will only get one worker thread.

CPU utilization is suggesting that you're only using one core; this may suggest that you've added threading to a portion where it is not beneficial (in this case, where CPU time is not a bottle neck).
If Loading the Cache or reading data happens very quickly, multi threading won't provide a massive improvement in speed performance. Similarly, if you're encountering a different bottleneck (slow bandwidth to a server, etc), it may not show up as CPU usage.

Design considerations for an adaptive thread pool in Java

I would like to implement a thread pool in Java, which can dynamically resize itself based on the computational and I/O behavior of the tasks submitted to it.
Practically, I want to achieve the same behavior as the new Thread Pool implementation in C# 4.0
Is there an implementation already or can I achieve this behavior by using mostly existing concurrency utilities (e.g. CachedThreadPool)?
The C# version does self instrumentation to achieve an optimal utilization. What self instrumentation is available in Java and what performance implications do the present?
Is it feasible to do a cooperative approach, where the task signals its intent (e.g. entering I/O intensive operation, entering CPU intensive operation phase)?
Any suggestions are welcome.
Edit Based on comments:
The target scenarios could be:
Local file crawling and processing
Web crawling
Multi-webservice access and aggregation
The problem of the CachedThreadPool is that it starts new threads when all existing threads are blocked - you need to set explicit bounds on it, but that's it.
For example, I have 100 web services to access in a row. If I create a 100 CTP, it will start 100 threads to perform the operation, and the ton of multiple I/O requests and data transfer will surely stumble upon each others feet. For a static test case I would be able to experiment and find out the optimal pool size, but I want it to be adaptively determined and applied in a way.

Consider creating a Map where the key is the bottleneck resource.
Each thread submitted to the pool will submit a resource which is it's bottleneck, ie "CPU", "Network", "C:\" etc.
You could start by allowing only one thread per resource and then maybe slowly ramp up until work completion rate stops increasing. Things like CPU could have a floor of the core count.

Let me present an alternative approach. Having a single thread pool is a nice abstraction, but it's not very performant, especially when the jobs are very IO-bound - then there's no good way to tune it, it's tempting to blow up the pool size to maximize IO throughput but you suffer from too many thread switches, etc.
Instead I'd suggest looking at the architecture of the Apache MINA framework for inspiration. (http://mina.apache.org/) It's a high-performance web framework - they describe it as a server framework, but I think their architecture works well for inverse scenarios as well, like spidering and multi-server clients. (Actually, you might even be able to use it out-of-the-box for your project.)
They use the Java NIO (non-blocking I/O) libraries for all IO operations, and divide up the work into two thread pools: a small and fast set of socket threads, and a larger and slower set of business logic threads. So the layers look as follows:
On the network end, a large set of NIO channels, each with a message buffer
A small pool of socket threads, which go through the channel list round-robin. Their only job is to check the socket, and move any data out into the message buffer - and if the message is done, close it out and transfer to the job queue. These guys are fast, because they just push bits around, and skip any sockets that are blocked on IO.
A single job queue that serializes all messages
A large pool of processing threads, which pull messages off the queue, parse them, and do whatever processing is required.
This makes for very good performance - IO is separated out into its own layer, and you can tune the socket thread pool to maximize IO throughput, and separately tune the processing thread pool to control CPU/resource utilization.

The example given is
Result[] a = new Result[N];
for(int i=0;i<N;i++) {
a[i] = compute(i);
}
In Java the way to paralellize this to every free core and have the work load distributed dynamically so it doesn't matter if one task takes longer than another.
// defined earlier
int procs = Runtime.getRuntime().availableProcessors();
ExecutorService service = Executors.newFixedThreadPool(proc);
// main loop.
Future<Result>[] f = new Future<Result>[N];
for(int i = 0; i < N; i++) {
final int i2 = i;
a[i] = service.submit(new Callable<Result>() {
public Result call() {
return compute(i2);
}
}
}
Result[] a = new Result[N];
for(int i = 0; i < N; i++)
a[i] = f[i].get();
This hasn't changed much in the last 5 years, so its not as cool as it was when it was first available. What Java really lacks is closures. You can use Groovy instead if that is really a problem.
Additional: If you cared about performance, rather than as an example, you would calculate Fibonacci in parallel because its a good example of a function which is faster if you calculate it single threaded.
One difference is that each thread pool only has one queue, so there is no need to steal work. This potentially means that you have more overhead per task. However, as long as your tasks typically take more than about 10 micro-seconds it won't matter.

I think you should monitor CPU utilization, in a platform-specific manner. Find out how many CPUs/cores you have, and monitor the load. When you find that the load is low, and you still have more work, create new threads - but not more than x times num-cpus (say, x=2).
If you really want to consider IO threads also, try to find out what state each thread is in when your pool is exhausted, and deduct all waiting threads from the total number. One risk is that you exhaust memory by admitting too many tasks, though.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.