I am working on creating a game, and I have a pathfinding function that takes about 100 ms. I have 5 enemies, each with this function in the contstructor:
newPath = new System.Threading.Timer((e) => {
getNewPath(); //The function that takes ~100 ms
}, null, 0, 5000);
Now, I am using System.Threading.Timer at an earlier point in the program (to run once every 50 ms just for a step function, to update positions and such). That one works fine, but if I run this function (don't forget I have 5 enemies, so it's running 5 times every 5 seconds), my whole entire computer just freezes. Now I don't have a crappy computer (it's not the best, but it's plenty good for what i'm using it for), so I don't know what the issue is. Even if all timers run one after the other (which they shouldn't do, they should run at the same time), the most it should take is 500ms (or half a second), yet it completely kills my computer, to the point where my mouse doesn't move, I can't Ctrl-Alt-Del, and I have to just hold the power button until it turns off.
I tested putting a simple print function in place of the getNewPath(), and it worked flawlessly and as expected, so I don't really know what the issue is.
My questions are:
What is causing my computer to lock up to the point of having to hold the power button.
Is there something I can use other than System.Threading.Timer that will give me the desired result without completely killing my computer? (Being able to run this function up to ~20 times at once, since it's an MMO and there could potentially be hundreds of enemies that it needs to do pathfinding updates on).
Thanks!
Without knowing the code inside getNewPath(), it is impossible to even guess the reason. And it is hard to believe that is only a simple A* path finding algorithm
Here are some points to start the investigation
What is the CPU usage rate before halt happens? What it is the rate when it happens? Which process has the highest rate?
What is the disk, network, memory usage rate?
Beside the above, does getNewPath consume other resources?
You can print 5 messages. But are they printed before/inside/after getNewPath
Do you have source code for getNewPath? Can you modify code in getNewPath?
Is getNewPath thread safe? Does it create more threads?
There are probably more things to look at. But these should be enough to get you started. And they are necessary for anyone to give meaning suggestions.
Related
I am developing an application which analyses real-time financial data. Currently my main computational cycle has the following design:
long cycle_counter=0;
while (process_data)
{
(analyse data, issue instruction - 5000 lines of straightforwasrd code with computations)
cycle_counter++;
Thread.Sleep(5);
}
When I run this application on my notebook (one Core i5) processor, the cycle runs 200-205 times per second - a sort of as expected (if you don't bother about why it runs more than 200 times a second).
But when I deploy the application on "real" workstation, which has 2 6-core Xeon processors and 24 GB of fast RAM, and which loads Win7 in about 3 seconds, the application runs the cycle about 67 times per second.
My questions are:
why is this happening?
how can I influence the number of runs per second in this situation?
are there any better solutions for running the cycle 200-1000 times per second? I am now thinking about just removing Thread.Sleep() (the way I use it here is criticised a lot). With 12 cores I have no problems using one core just for this cycle. But there my be some downside to such solution?
Thank you for your ideas.
The approach you're taking is simply fundamentally broken. Polling strategies are in general a bad way to go, and any time you do a Sleep for a reason other than "I want to give the rest of my timeslice back to the operating system", you're probably doing something wrong.
A better way to approach the problem is:
Make a threadsafe queue of unprocessed work
Make one thread that puts new work in the queue
Make n threads that take work out of the queue and do the work. n should be the number of CPUs you have minus one. If you have more than n threads then at least two threads are trading off CPU time, which is making them both slower!
The worker threads do nothing but sit in a loop taking work out of the queue and doing the work.
If the queue is empty then the "take work out" blocks.
When new work arrives, one of the blocked threads is reactivated.
How to build a queue with these properties is a famous problem called The Producer/Consumer Problem. There are lots of articles on how to do it any many implementations of blocking producer-consumer queues. I recommend finding an existing debugged one rather than trying to write your own; getting it right can be tricky.
Windows is not a RTOS (Real Time Operating System), so you cannot precisely determine when your thread will resume. Thread.Sleep(5) really means "wake me up no sooner then 5ms". The actual sleep time is determined by the specific hardware and mostly by the system load. You can try to workaround the system load issue by running your application on a higher priority.
BTW, System.Threading.Timer is a better approach (above comments still apply though).
The resolution of Sleep is dictated by the current timer tick interval and is usually either 10 or 15 milliseconds depending on the edition of Windows. This can be changed, however, by issuing a timeBeginPeriod command. See this answer.
Check your timer's actual frequency: many hardware timers have actual resolution
65536 ticks per hour = 65536 / 3600 = 18.204 ticks per second
So called "18.2" constant, that's why the actual timer's resolution is 1/18.2 = 55 ms; in the case of Sleep(5) it means that is could be either Sleep(0) or Sleep(55) depending on round up.
Not sure it is the best approach but another approach.
Try BlockingCollection and all you do in the producer is add and sleep.
The consumer then has the option to work full time if needed.
This still does not explain why the higher powered PC ran less cycles.
Is it OK for you to run your loop 200 times per second on average?
var delay = TimeSpan.FromMillseconds(5);
while (process_data) {
Console.WriteLine("do work");
var now = DateTime.Now;
if (now < nextDue)
System.Threading.Thread.Sleep(nextDue - now);
nextDue = nextDue.Add(delay);
}
Using this technique, your loop will execute somewhat stumbling, but it should be OK on average, as the code depends neither on the resolution of Sleep nor on the resolution of DateTime.Now.
You might even combine this approach with a Timer.
I always wondered how all the programs that has a progress bar can know almost exactly how much time it takes for an operation to finish (and the whole processing of the program's work), and thus being able to map that with the progress bar.
In C#, I always face difficulties when dealing with a progress bar. I came to a conclusion that there's no generic solution to this, is there?
I'm not asking this:
"For example: how would you know how much time File.ReadAllBytes(path) takes to read a file? Simple, start a stopwatch before it, and stop and read the time after it. But that's just on your computer, on your CPU, on your disk! - The reading of 1MB will mostly be different from your machine and others"
I'm asking how can I know how much time a thing needs to finish before even going inside of it? I mean, I have to know how much time File.ReadAllBytes() takes, before it gets executed so that the progress bar could step accordingly DURING the method's execution.
One dumb way of doing this, is to do it twice! Start the operation, calculate the time, then run it again, but this time you step the progress bar (lol)
I don't know if complexity has anything to do with this. I use big-O when I write my own methods and functions, not when dealing with a pre-defined function.
EDIT: Just an example:
I made a program that XORes a file and overwrite it with the XORed form. Now this involves 1-Reading the file's bytes to a byte array. 2-XORing them. 3-Writing them back again to the file. Now if I wanna use a progress bar, how can I go about knowing how much time do those operations take, so that I could make the progress bar increase accordingly?
My solution to this was to use a global variable, (like prgBar), assign the XORing to a separate thread, and incrementing "prgBar" in that thread each time I XOR a byte from the byte array I read from the file, and then in the main thread, I used a timer, each tick I did: prgressBar.Value = prgBar
I'm facing problems with this, it's not even accurate, it might start late.
The quick answer is "They don't". Haven't you ever seen progress bars which jump from "20 minutes left" to "5 minutes left" in the course of 20 seconds? Or ones which say "0:00 left" and just sit there?
The better answer is "They estimate". There's two forms of this I can think of, but I haven't done any extensive research. They both use past performance to estimate future progress, though.
The first form works for a series of discrete tasks of similar complexity but arbitrary length, such as file copying/deletion. You start off with a low base estimate or a "Calculating..." message, and then as each file is finished you make a new estimate of how long it will take based on how long it's taken to copy/delete all the files so far. For example, if you're deleting 2000 files, and the first 5 have deleted in {400, 200, 900, 100, 400 ms}, then you know that you can delete an average of 5 files in 2000ms, which means you can delete 2000 files in 800000ms or 800 seconds, and display a progress bar to your user saying "13 minutes remaining" (798 seconds). It might not be accurate, but if you keep revising it after every few files, it will progress fairly smoothly.
The second form is when the length of tasks and their complexity are known (and greatly varies), but the speed is still unknown, such as with an installer. In this scenario you can make your estimate by running your installer many times in testing, and figuring out how different components' times relate. This could be as complex as comparing individual steps to derive a formula, or as simple as just assigning percentages of install time to each stage. Then, when the user does their install, you know that the first 2% worth of components took X seconds duration, so estimate 50*X seconds left. This is why installers often seem to hang or jump around - they're waiting on something that's unexpectedly taking much longer than usual, or they discovered that they didn't need to perform a step on your system.
In the end, it's all just an estimate. Percentages could be hard ("I'm on step 1/5, so show 20% complete"), but durations can never be. See Jbecwar's answer for the theory behind this.
What your describing is the halting problem. Long story short you can't tell when an arbrtary process will end, but you can make a guess.
For example lets say you have 1MB to transfer. If you transfer one byte at a time and then update the progress bar after each byte, then it will look smooth. Could also display a time estimate based on the past. For example if it took me 1 sec to move 10% of the 1MB then in 9 seconds I should be done.
Hope that helps.
You have to calculate the completeness of the operation yourself and it's not always possible to do accurately. For reading a file you could do something like get number of lines, count how many you've read, then do current/total to display progress. This actually works a lot better for more complex operations.
Say you have an array of a 1000 objects who's data needs to be processed. You're doing this in a for loop so the progress bar can simply display i/array.length. The ReadAllBytes example is no good because there is no way of taking the progress of that task. The .NET method will return all the bytes, so you can only know when the task starts and completes, not to mention the task will be so quick that a progress bar is unnecessary.
Actually you should read the file in batches (eg. 20KB each time ) so you can increment the progress bar after each batch
You are correct, there is no once-size-fits-all for progress bars unless you consider using the progress bar as an "I'm still processing" bar. Could have the progress bar update from 0 to 100(max) over and over again.
There are times where you will know how you are progressing, and can report it back to the user.
Lets take your File.ReadAllBytes(path) example:
You should be able to get the size of your file. You can also determine how many bytes you have read so far by incrementing a variable. By doing some simple math, you can take bytesRead/bytesInFile to get a % of your completion. Report this back to the user on by a timer that ticks every 500ms, and you should have progress bar with proper feedback.
I'm creating an XNA game and am getting an unexpected result, extremely low FPS (About 2-12 fps). What program should I use to test performance and track down what is slowing it down?
Have you tried using SlimTune?
You could also use StopWatch to measure sections manually.
Okay, let me bring my personal experience with game development in XNA.
The first thing you need to do is go into Debug -> Start Performance Analysis. This profiles your CPU activity and see's what threads are in use and what is doing the most processing.
You also need to factor in a couple of more things:
-You are probably running in debug mode, this means that some of your CPU is being dedicated to VS and to check for exceptions and what not.
-Your code might be inefficient. I recommend trying to limit the amount of Lists, Arrays, ADT's, and objects created during run time, because that slows it down a lot. Last time I checked the Game Loop ran 60 times a second so that imagine what a strain it would be to allocate a new List, then garbage collect it, 60 times a second. It starts adding up.
-I don't know how advanced you are, but read up on parallel threading, or multitasking.
An example would to have your physics engine 1 frames behind your graphics update.
EDIT: I realized you found your mistake but I hope this post can help others.
I am trying to make the loading part of a C# program faster. Currently it takes like 15 seconds to load up.
On first glimpse, things that are done during the loading part includes constructing many 3rd Party UI components, loading layout files, xmls, DLLs, resources files, reflections, waiting for WndProc... etc.
I used something really simple to see the time some part takes,
i.e. breakpointing at a double which holds the total milliseconds of a TimeSpan which is the difference of a DateTime.Now at the start and a DateTime.Now at the end.
Trying that a few times will give me sth like,
11s 13s 12s 12s 7s 11s 12s 11s 7s 13s 7s.. (Usually 12s, but 7s sometimes)
If I add SuspendLayout, BeginUpdate like hell; call things in reflections once instead of many times; reduce some redundant redundant computation redundancy. The time are like 3s 4s 3s 4s 3s 10s 4s 4s 3s 4s 10s 3s 10s.... (Usually 4s, but 10s sometimes)
In both cases, the times are not consistent but more like, a bimodal distribution? It really made me unsure whether my correction of the code is really making it faster.
So I would like to know what will cause such result.
Debug mode?
The "C# hve to compile/interpret the code on the 1st time it runs, but the following times will be faster" thing?
The waiting of WndProc message?
The reflections? PropertyInfo? Reflection.Assembly?
Loading files? XML? DLL? resource file?
UI Layouts?
(There are surely no internet/network/database access in that part)
Thanks.
Profiling by stopping in the debugger is not a reliable way to get timings, as you've discovered.
Profiling by writing times to a log works fine, although why do all this by hand when you can just launch the program in dotTrace? (Free trial, fully functional).
Another thing that works when you don't have access to a profiler is what I call the binary approach - look at what happens in the code and try to disable about half of it by using comments. Note the effect on the running time. If it appears significant, repeat this process with half of that half, and so on recursively until you narrow in on the most significant piece of work. The difficulty is in simulating the side effects of the missing code so that that the remaining code can still work, so this is still harder than using a debugger, but can be quicker than adding a lot of manually time logging, because the binary approach lets you zero in on the slowest place in logarithmic time.
Raymond Chen's advise is good here. When people ask him "How can I make my application start up faster?" he says "Do less stuff."
(And ALWAYS profile the release build - profiling the debug build is generally a wasted effort).
Profile it. you can use eqatec its free
Well, the best thing is to run your application through a profiler and see what the bottlenecks are. I've personally used dotTrace, there are plenty of others you can find on the web.
Debug mode turns off a lot of JIT optimizations, so apps will run a lot slower than release builds. Whatever the mode, JITting has to happen, so I'd discount that as a significant factor. Time to read files from disk can vary based on the OS's caching mechanism, and whether you're doing a cold start or a warm start.
If you have to use timers to profile, I'd suggest repeating the experiment a large number of times and taking the average.
Profiling you code is definitely the best way to identify which areas are taking the longest to run.
As for the other part of your question about the inconsistent timings: timings in an multitasking O/S are inherently inconsistent, and working with managed code throws the garbage collector into the equation too. It could be that the GC is kicking in during your timing which will obviously slow things down.
If you want to try and get a "purer" timing try putting a GC collect before you start your timers, this way it is less likely to start in your timing section. Do remember to remove the timers after, as second guessing when the GC should run normally results in poorer performance.
Apart from the obvious (profiling), which will tell you precisely where time is being spent, there are some other points that spring to mind:
To get reasonable timing results with the approach you are using, run a release build of your program, and have it dump the timing results to a file (e.g. with Trace.WriteLine). Timing a debug version will give you spurious results. When running the timing tests, quit all other applications (including your debugger) to minimise the load on your computer and get more consistent results. Run the program many times and look at the average timings. Finally, bear in mind that Windows caches a lot of stuff, so the first run will be slow and subsequent runs will be much faster. This will at least give you a more consistent basis to tell whether your improvements are making a significant difference.
Don't try and optimise code that shouldn't be run in the first place - Can you defer any of the init tasks? You may find that some of the work can simply be removed from the init sequence. e.g. if you are loading a data file, check whether it is needed immediately - if not, then you could load it the first time it is needed instead of during program startup.
At my job, I have a clutch of six Windows services that I am responsible for, written in C# 2003. Each of these services contain a timer that fires every minute or so, where the majority of their work happens.
My problem is that, as these services run, they start to consume more and more CPU time through each iteration of the loop, even if there is no meaningful work for them to do (ie, they're just idling, looking through the database for something to do). When they start up, each service uses an average of (about) 2-3% of 4 CPUs, which is fine. After 24 hours, each service will be consuming an entire processor for the duration of its loop's run.
Can anyone help? I'm at a loss as to what could be causing this. Our current solution is to restart the services once a day (they shut themselves down, then a script sees that they're offline and restarts them at about 3AM). But this is not a long term solution; my concern is that as the services get busier, restarting them once a day may not be sufficient... but as there's a significant startup penalty (they all use NHibernate for data access), as they get busier, exactly what we don't want to be doing is restarting them more frequently.
#akmad: True, it is very difficult.
Yes, a service run in isolation will show the same symptom over time.
No, it doesn't. We've looked at that. This can happen at 10AM or 6PM or in the middle of the night. There's no consistency.
We do; and they are. The services are doing exactly what they should be, and nothing else.
Unfortunately, that requires foreknowledge of exactly when the services are going to be maxing out CPUs, which happens on an unpredictable schedule, and never very quickly... which makes things doubly difficult, because my boss will run and restart them when they start having problems without thinking of debug issues.
No, they're using a fairly consistent amount of RAM (approx. 60-80MB each, out of 4GB on the machine).
Good suggestions, but rest assured, we have tried all of the usual troubleshooting. What I'm hoping is that this is a .NET issue that someone might know about, that we can work on solving. My boss' solution (which I emphatically don't want to implement) is to put a field in the database which holds multiple times for the services to restart during the day, so that he can make the problem go away and not think about it. I'm desperately seeking the cause of the real problem so that I can fix it, because that solution will become a disaster in about six months.
#Yaakov Ellis: They each have a different function. One reads records out of an Oracle database somewhere offsite; another one processes those records and transfers files belonging to those records over to our system; a third checks those files to make sure they're what we expect them to be; another is a maintenance service that constantly checks things like disk space (that we have enough) and polls other servers to make sure they're alive; one is running only to make sure all of these other ones are running and doing their jobs, monitors and reports errors, and restarts anything that's failed to keep the whole system going 24 hours a day.
So, if you're asking what I think you're asking, no, there isn't one common thing that all these services do (other than database access via NHibernate) that I can point to as a potential problem. Unfortunately, if that turns out to be the actual issue (which wouldn't surprise me greatly), the whole thing might be screwed -- and I'll end up rewriting all of them in simple SQL. I'm hoping it's a garbage collector problem or something easier to deal with than NHibernate.
#Joshdan: No secret. As I said, we've tried all the usual troubleshooting. Profiling was unhelpful: the profiler we use was unable to point to any code that was actually executing when the CPU usage was high. These services were torn apart about a month ago looking for this problem. Every section of code was analyzed to attempt to figure out if our code was the issue; I'm not here asking because I haven't done my homework. Were this a simple case of the services doing more work than anticipated, that's something that would have been caught.
The problem here is that, most of the time, the services are not doing anything at all, yet still manage to consume 25% or more of four CPU cores: they're finding no work to do, and exiting their loop and waiting for the next iteration. This should, quite literally, take almost no CPU time at all.
Here's a example of behaviour we're seeing, on a service with no work to do for two days (in an unchanging environment). This was captured last week:
Day 1, 8AM: Avg. CPU usage approx 3%
Day 1, 6PM: Avg. CPU usage approx 8%
Day 2, 7AM: Avg. CPU usage approx 20%
Day 2, 11AM: Avg. CPU usage approx 30%
Having looked at all of the possible mundane reasons for this, I've asked this question here because I figured (rightly, as it turns out) that I'd get more innovative answers (like Ubiguchi's), or pointers to things I hadn't thought of (like Ian's suggestion).
So does the CPU spike happen
immediately preceding the timer
callback, within the timer callback,
or immediately following the timer
callback?
You misunderstand. This is not a spike. If it were, there would be no problem; I can deal with spikes. But it's not... the CPU usage is going up generally. Even when the service is doing nothing, waiting for the next timer hit. When the service starts up, things are nice and calm, and the graph looks like what you'd expect... generally, 0% usage, with spikes to 10% as NHibernate hits the database or the service does some trivial amount of work. But this increases to an across-the-board 25% (more if I let it go too far) usage at all times while the process is running.
That made Ian's suggestion the logical silver bullet (NHibernate does a lot of stuff when you're not looking). Alas, I've implemented his solution, but it hasn't had an effect (I have no proof of this, but I actually think it's made things worse... average usage is seeming to go up much faster now). Note that stripping out the NHibernate "sections" (as you recommend) is not feasible, since that would strip out about 90% of the code in the service, which would let me rule out the timer as a problem (which I absolutely intend to try), but can't help me rule out NHibernate as the issue, because if NHibernate is causing this, then the dodgy fix that's implemented (see below) is just going to have to become The Way The System Works; we are so dependent on NHibernate for this project that the PM simply won't accept that it's causing an unresolvable structural problem.
I just noted a sense of desperation in
the question -- that your problems
would continue barring a small miracle
Don't mean for it to come off that way. At the moment, the services are being restarted daily (with an option to input any number of hours of the day for them to shutdown and restart), which patches the problem but cannot be a long-term solution once they go onto the production machine and start to become busy. The problems will not continue, whether I fix them or the PM maintains this constraint on them. Obviously, I would prefer to implement a real fix, but since the initial testing revealed no reason for this, and the services have already been extensively reviewed, the PM would rather just have them restart multiple times than spend any more time trying to fix them. That's entirely out of my control and makes the miracle you were talking about more important than it would otherwise be.
That is extremely intriguing (insofar
as you trust your profiler).
I don't. But then, these are Windows services written in .NET 1.1 running on a Windows 2000 machine, deployed by a dodgy Nant script, using an old version of NHibernate for database access. There's little on that machine I would actually say I trust.
You mentioned that you're using NHibernate - are you closing your NHibernate sessions at appropriate points (such as the end of each iteration?)
If not, then the size of the object map loaded into memory will be gradually increasing over time, and each session flush will take increasingly more CPU time.
Here's where I'd start:
Get Process Explorer and show %Time in JIT, %Time in GC, CPU Cycles Delta, CPU Time, CPU %, and Threads.
You'll also want kernel and user time, and a couple of representative stack traces but I think you have to hit Properties to get snapshots.
Compare before and after shots.
A couple of thoughts on possibilities:
excessive GC (% Time in GC going up. Also, Perfmon GC and CPU counters would correspond)
excessive threads and associated context switches (# of threads going up)
polling (stack traces are consistently caught in a single function)
excessive kernel time (kernel times are high - Task Manager shows large kernel time numbers when CPU is high)
exceptions (PE .NET tab Exceptions thrown is high and getting higher. There's also a Perfmon counter)
virus/rootkit (OK, this is a last ditch scenario - but it is possible to construct a rootkit that hides from TaskManager. I'd suspect that you could then allocate your inevitable CPU usage to another process if you were cunning enough. Besides, if you've ruled out all of the above, I'm out of ideas right now)
It's obviously pretty difficult to remotely debug you're unknown application... but here are some things I'd look at:
What happens when you only run one of the services at a time? Do you still see the slow-down? This may indicate that there is some contention between the services.
Does the problem always occur around the same time, regardless of how long the service has been running? This may indicate that something else (a backup, virus scan, etc) is causing the machine (or db) as a whole to slow down.
Do you have logging or some other mechanism to be sure that the service is only doing work as often as you think it should?
If you can see the performance degradation over a short time period, try running the service for a while and then attach a profiler to see exactly what is pegging the CPU.
You don't mention anything about memory usage. Do you have any of this information for the services? It's possible that your using up most of the RAM and causing the disk the trash, or some similar problem.
Best of luck!
I suggest to hack the problem into pieces.
First, find a way to reproduce the problem 100% of the times and quickly. Lower the timer so that the services fire up more frequently (for example, 10 times quicker than normal). If the problem arises 10 times quicker, then it's related to the number of iterations and not to real time or to real work done by the services). And you will be able to do the next steps quicker than once a day.
Second, comment out all the real work code, and let only the services, the timers and the synchronization mechanism. If the problem still shows up, than it will be in that part of the code.
If it doesn't, then start adding back the code you commented out, one piece at a time. Eventually, you should find out what part of the code is causing the problem.
'Fraid this answer is only going to suggest some directions for you to look in, but having seen similar problems in .NET Windows Services I have a couple of thoughts you might find helpful.
My first suggestion is your services might have some bugs in either the way they handle memory, or perhaps in the way they handle unmanaged memory. The last time I tracked down a similar issue it turned out a 3rd party OSS libray we were using stored handles to unmanaged objects in static memory. The longer the service ran the more handles the service picked up which caused the process' CPU performance to nose-dive very quickly. The way to try and resolve this sort of issue to ensure your services store nothing in memory inbetween the timer invocations, although if your 3rd party libraries use static memory you might have to do something clever like create an app domain for the timer invocation and ditch the app doamin (and its static memory) once processing is complete.
The other issue I've seen in similar circumstances was with the timer synchronization code being suspect, which in effect allowed more than one thread to be running the processing code at once. When we debugged the code we found the 1st thread was blocking the 2nd, and by the time the 2nd kicked off there was a 3rd being blocked. Over time the blocking was lasting longer and longer and the CPU usage was therefore heading to the top. The solution we used to fix the issue was to implement proper synchronization code so the timer only kicked off another thread if it wouldn't be blocked.
Hope this helps, but apologies up front if both my thoughts are red herrings.
Sounds like a threading issue with the timer. You might have one unit of work blocking another running on different worker threads, causing them to stack up every time the timer fires. Or you might have instances living and working longer than you expect.
I'd suggest refactoring out the timer. Replace it with a single thread that queues up work on the ThreadPool. You can Sleep() the thread to control how often it looks for new work. Make sure this is the only place where your code is multithreaded. All other objects should be instantiated as work is readied for processing and destroyed after that work is completed. STATE IS THE ENEMY in multithreaded code.
Another area where the design is lacking appears to be that you have multiple services that are polling resources to do something. I'd suggest unifying them under a single service. They might do seperate things, but they're working in unison; you're just using the filesystem, database, etc as a substitution for method calls. Also, 2003? I feel bad for you.
Good suggestions, but rest assured, we have tried all of the usual troubleshooting. What I'm hoping is that this is a .NET issue that someone might know about, that we can work on solving.
My feeling is that no matter how bizarre the underlying cause, the usual troubleshooting steps are your best bet for locating the issue.
Since this is a performance issue, good measurements are invaluable. The overall process CPU usage is far too broad a measurement. Where is your service spending its time? You could use a profiler to measure this, or just log various section start and stops. If you aren't able to do even that, then use Andrea Bertani's suggestion -- isolate sections by removing others.
Once you've located the general area, then you can make even finer-grained measurements, until you sort out the source of the CPU usage. If it's not obvious how to fix it at that point, you at least have ammunition for a much more specific question.
If you have in fact already done all this usual troubleshooting, please do let us in on the secret.