QueueUserWorkItem() performance issues using Mono C# - c#

I'm getting intermittent spikes in my profiler around the call to QueueUserWorkItem(). I queue roughly 20 jobs every second. Each job is roughly identical in terms of complexity. 95% of the jobs spend less than 0.01ms in the call to QueueUserWorkItem. But every few seconds, seemingly at random, a job will take from 20-60ms!
I'm trying to build a smooth simulation and just queuing my background tasks is causes serious hitches in my framerate.
This is not the time it takes for the job to actually FINISH, this is time spent on simply QUEUEING the job. Which I would expect should take virtually no time at all, so this is very frustrating.
foreach ( Job j in jobs_to_process )
{
Job current_job = j;
if ( !current_job.queued )
{
current_job.queued = true;
Profiler.BeginSample("QueueWorkItem");
ThreadPool.UnsafeQueueUserWorkItem( current_job.DoCalculation, null );
Profiler.EndSample();
}
}
//now I remove queued items from jobs_to_process, move them to jobs_in_progress list
Things I've tried:
Queuing fewer jobs per second only causes proportionately fewer hitches, but it still hitches.
Putting lock(obj) around all access to job.queued variable
Using SmartThreadPool instead of defaultThreadPool. Problem still
persists inside STP's QueueWorkItem()

Try with the --server flag when running mono (only compatible with mono master, as it was committed recently).

Related

Why if I run method one time is done work almost the same time if I run few times in for loop c#

I make one method who doing some simple operations like +, -, *, /.
I need to run this method 1513 times.
Here I try to run this method only once. To see do is working good and how times is be needed for to finish with operations.
Stopwatch st = new Stopwatch();
st.Start();
DiagramValue dv = new DiagramValue();
double pixel = dv.CalculateYPixel(23.46, diction);
st.Stop();
When is stop the stopwatch is teling me the time is 0.06s.
When I run the same method 1513 times in for loop like that:
Stopwatch st = new Stopwatch();
st.Start();
for (int i = 0; i < 1513; i++)
{
DiagramValue dv = new DiagramValue();
double pixel = dv.CalculateYPixel(23.46, diction);
}
st.Stop();
Then the Stopwatch is tell me is working around 0.14s. Or 0.14s / 1513 times = 0.00009s for one time.
My question is why If I running some method only once is too slow and if I running around thousand times in for loop is almost the same time.
Writing benchmarks is hard.
First, Stopwatch isn't infinitely accurate. When you run the method just once, you're very much limited by the accuracy of the underlying stopwatch. On the other hand, running the method multiple times alleviates this - you can get arbitrary precision by using a big enough loop. Instead of 1 vs 1513, compare e.g. 1500 vs. 3000. You'll get around 100% time increase, as expected.
Second, there's usually some cost with the first call in particular (e.g. JIT compilation) or with the memory pressure at the time of the call. That's why you usually need to do "preheating" - run the method outside of the stopwatch first to isolate these, and measure (multiple invocations) later.
Third, in a garbage collected environment like .NET, the guy who ordered the beer isn't necessarily the guy who pays the bill. Most of the cost of memory allocation in .NET is in the collection, rather than the allocation itself (which is about as cheap as a stack allocation). The collection usually happens outside of the code that caused the allocations in the first place, pointing you in the entirely wrong direction when searching for performance issues. That's why most .NET memory trackers display garbage collection separately - it's important to take account of, but can easily mislead you as to the cause if you're not careful.
There's many more issues, but these should cover your particular scenario well enough.
Some possible reasons include:
Timing resolution. You get a more accurate figure when you find the mean over a large number of iterations.
Noise. The percentage of stuff that isn't what you actually want to record, will be different.
Jitting. .NET will create code the first time a method is used. As such the first time it is run in a programs lifetime, the longer it will take, by a large factor (try running it once and then measuring the second attempt).
Branch prediction. If you keep doing the same thing with the same data the CPU's branch predictor is going to get better at predicting which branches are takken.
GC stability. Not likely in this case, but possible. Often at the start of a set of operations that requires particular objects to be created and then released the program ends up having to get more memory from the OS. When it's a bit into that set of operations it's more likely to have reached a steady state where it can just get that memory by cleaning out objects it isn't using any more, which is faster.

High cpu when using PerformanceCounter.NextValue

We have created a monitoring application for our enterprise app that will monitor our applications Performance counters. We monitor a couple system counters (memory, cpu) and 10 or so of our own custom performance counters. We have 7 or 8 exes that we monitor, so we check 80 counters every couple seconds.
Everything works great except when we loop over the counters the cpu takes a hit, 15% or so on my pretty good machine but on other machines we have seen it much higher. We are wanting our monitoring app to run discretely in the background looking for issues, not eating up a significant amount of the cpu.
This can easily be reproduced by this simple c# class. This loads all processes and gets Private Bytes for each. My machine has 150 processes. CallNextValue Takes 1.4 seconds or so and 16% cpu
class test
{
List<PerformanceCounter> m_counters = new List<PerformanceCounter>();
public void Load()
{
var processes = System.Diagnostics.Process.GetProcesses();
foreach (var p in processes)
{
var Counter = new PerformanceCounter();
Counter.CategoryName = "Process";
Counter.CounterName = "Private Bytes";
Counter.InstanceName = p.ProcessName;
m_counters.Add(Counter);
}
}
private void CallNextValue()
{
foreach (var c in m_counters)
{
var x = c.NextValue();
}
}
}
Doing this same thing in Perfmon.exe in windows and adding the counter Process - Private Bytes with all processes selected I see virtually NO cpu taken up and it's also graphing all processes.
So how is Perfmon getting the values? Is there a better/different way to get these performance counters in c#?
I've tried using RawValue instead of NextValue and i don't see any difference.
I've played around with Pdh call in c++ (PdhOpenQuery, PdhCollectQueryData, ...). My first tests don't seem like these are any easier on the cpu but i haven't created a good sample yet.
I'm not very familiar with the .NET performance counter API, but I have a guess about the issue.
The Windows kernel doesn't actually have an API to get detailed information about just one process. Instead, it has an API that can be called to "get all the information about all the processes". It's a fairly expensive API call. Every time you do c.NextValue() for one of your counters, the system makes that API call, throws away 99% of the data, and returns the data about the single process you asked about.
PerfMon.exe uses the same PDH APIs, but it uses a wildcard query -- it creates a single query that gets data for all of the processes at once, so it essentially only calls c.NextValue() once every second instead of calling it N times (where N is the number of processes). It gets a huge chunk of data back (data for all of the processes), but it's relatively cheap to scan through that data.
I'm not sure that the .NET performance counter API supports wildcard queries. The PDH API does, and it would be much cheaper to perform one wildcard query than to perform a whole bunch of single-instance queries.
Sorry for a long response, but I've found your question only now. Anyway, if anyone will need additional help, I have a solution:
I've made a little research on my custom process and I've understood that when we have a code snippet like
PerformanceCounter ourPC = new PerformanceCounter("Process", "% Processor time", "processname", true);
ourPC.NextValue();
Then our performance counter's NextValue() will show you the (number of logical cores * task manager cpu load of the process) value which is kind of logical thing, I suppose.
So, your problem may be that you have a slight CPU load in the task manager because it understands that you have a multiple core CPU, although the performance counter counts it by the formula above.
I see a one (kind of crutchy) possible solution for your problem so your code should be rewritten like this:
private void CallNextValue()
{
foreach (var c in m_counters)
{
var x = c.NextValue() / Environment.ProcessorCount;
}
}
Anyway, I do not recommend you to use Environment.ProcessorCount although I've used it: I just didn't want to add too much code to my short snippet.
You can see a good way to find out how much logical cores (yeah, if you have core i7, for example, you'll have to count logical cores, not physical) do you have in a system if you'll follow this link:
How to find the Number of CPU Cores via .NET/C#?
Good luck!

Strange Behavior with Threading and Timer

I explain my situation.
I have a producer 1 to N consumers pattern. I'm using blocking collections and everything is working well. Doing some test I noticed this strange behavior:
I was testing how long my manipulation of data took in my consumers.
I noticed this strange things, below you'll find the code cleaned of my manipulation and which produce the strange behavior.
I have 4 consumers for 1 producer.
For most of data, the Console doesn't print anything, because ts=0 (its under a tick) but randomly (between every 1 to 5sec) it plots something like this (not in this very specific order, but of the same kind):
10000
20001
10000
30002
10000
40003
10000
10000
It is of the order of 10,000 ticks so around 1ms. Always a number in the format (N)000(N-1)
Note that the BlockingCollection I consume is filled depending on some network events which occurred completely at random times. Nothing regular from here.
The timing is almost perfect, always a multiple of 10,000 ticks.
What could be behind this ? Thks !
while(IsAlive)
{
DataToFieldMapping item;
try
{
_CollectionToConsume.TryTake(out item, -1);
}
catch
{
item = null;
}
if (item != null)
{
long ts = (DateTime.Now.Ticks - item.TimeStamp.Ticks);
if(ts>10)
Console.WriteLine(ts);
}
}
What's going on here is that DateTime.Now has a fairly limited precision. It's not giving you the time to the nearest tick. It is only updated every 10,000 ticks or so, which is why you generally see multiples of 10k ticks in your prints.
If you really want to get a better feel for the duration of those events, use the StopWatch class, which has a much higher precision. That said, StopWatch is simply a diagnostic tool (hence why it's in the Diagnostics namespace). You should only be using it to help you diagnose what's going on, and should be using it in production code.
On a side note, there really isn't any need to use a timer here at all. It appears that you're creating several consumers that are polling the BlockingCollection for new content. There is no reason to do this. They can simply block until the collection has items. (Hence the name, BlockingCollection.
The easiest way is for the consumers to simply do this:
foreach(var item in _CollectionToConsume.GetConsumingEnumerable())
ProcessItem(item);
Then just run that code in a background thread.
if you write the following and run, you'll see that ticks do not roll one to one, but rather in relatively large chunks b/c ticks resolution is actually much smaller.
for(int i =0; i< 100; i++)
{
Console.WriteLine(DateTime.Now.Ticks);
}
Use Stopwatch class to measure performance as that one uses a high-resolution timer which is much more suitable for the purpose.

PerformanceCounter in while loop

I am currently trying to write an application that runs the same code exactly 100 times a second. I have done some testing using the built-in timers of the .NET framework. I've tested the System.Threading.Timer class, the System.Windows.Forms.Timer class and the System.Timers.Timer class. None of them seem to be accurate enough for what I am trying to do.
I've found out about PerformanceCounters and am currently trying to implement such in my application.
However, I am having a bit of trouble with my program taking up a whole core of my CPU when idling.
I only need it to be active 100 times a second. My loop looks like this:
long nextTick, nextMeasure;
QueryPerformanceCounter(out start);
nextTick = start + countsPerTick;
nextMeasure = start + performanceFrequency;
long currentCount;
while (true)
{
QueryPerformanceCounter(out currentCount);
if (currentCount >= nextMeasure)
{
Debug.Print("Ticks this second: " + tickCount);
tickCount = 0;
nextMeasure += performanceFrequency;
}
if (currentCount >= nextTick)
{
Calculations();
tickCount++;
nextTick += countsPerTick;
}
}
As you can see, most of the time the program will be waiting to run Calculations() again by running through the while loop constantly. Is there a way to stop this from happening? I don't want to slow the computers my program will be run on down.
System.Thread.Thread.Sleep unfortunately is also pretty "inaccurate", but I would be okay with using it if there is no other solution.
What I am basically asking is this: Is there a way to make an infinite loop less CPU-intensive? Is there any other way of accurately waiting for a specific amount of time?
As I'm sure you're aware, Windows is not a real-time O/S, so there can never be any guarantee that your code will run as often as you want.
Having said that, the most efficient in terms of yielding to other threads is probably to use Thread.Sleep() as the timer. If you want higher accuracy than the default you can issue a timeBeginPeriod with the desired resolution down to a millisecond. The function must be DLLImported from winmm.dll.
timeBeginPeriod(1) together with a normal timer or Thread.Sleep should work decently.
Note that this has a global effect. There are claims that it increases power consumption, since it forces the windows timer to run more often, shortening the CPU sleeping periods. This means you should generally avoid it. But if you need highly accurate timing, it's certainly a better choice than a busy-wait.

multithread performance problem for web service call

Here is my sample program for web service server side and client side. I met with a strnage performance problem, which is, even if I increase the number of threads to call web services, the performance is not improved. At the same time, the CPU/memory/network consumption from performance panel of task manager is low. I am wondering what is the bottleneck and how to improve it?
(My test experience, double the number of threads will almost double the total response time)
Client side:
class Program
{
static Service1[] clients = null;
static Thread[] threads = null;
static void ThreadJob (object index)
{
// query 1000 times
for (int i = 0; i < 100; i++)
{
clients[(int)index].HelloWorld();
}
}
static void Main(string[] args)
{
Console.WriteLine("Specify number of threads: ");
int number = Int32.Parse(Console.ReadLine());
clients = new Service1[number];
threads = new Thread[number];
for (int i = 0; i < number; i++)
{
clients [i] = new Service1();
ParameterizedThreadStart starter = new ParameterizedThreadStart(ThreadJob);
threads[i] = new Thread(starter);
}
DateTime begin = DateTime.Now;
for (int i = 0; i < number; i++)
{
threads[i].Start(i);
}
for (int i = 0; i < number; i++)
{
threads[i].Join();
}
Console.WriteLine("Total elapsed time (s): " + (DateTime.Now - begin).TotalSeconds);
return;
}
}
Server side:
[WebMethod]
public double HelloWorld()
{
return new Random().NextDouble();
}
thanks in advance,
George
Although you are creating a multithreaded client, bear in mind that .NET has a configurable bottleneck of 2 simultaneous calls to a single host. This is by design.
Note that this is on the client, not the server.
Try adjusting your app.config file in the client:
<system.net>
<connectionManagement>
<add address=“*” maxconnection=“20″ />
</connectionManagement></system.net>
There is some more info on this in this short article :
My experience is generally that locking is the problem: I had a massively parallel server once that spent more time context switching than it did performing work.
So - check your memory and process counters in perfmon, if you look at context switches and its high (more than 4000 per second) then you're in trouble.
You can also check your memory stats on the server too - if its spending all its time swapping, or just creating and freeing strings, it'll appear to stall also.
Lastly, check disk I/O, same reason as above.
The resolution is to remove your locks, or hold them for a minimum of time. Our problem was solved by removing the dependence on COM BSTRs and their global lock, you'll find that C# has plenty of similar synchronisation bottlenecks (intended to keep your code working safely). I've seen performance drop when I moved a simple C# app from a single-core to a multi-core box.
If you cannot remove the locks, the best option is not to create as many threads :) Use a thread pool instead to let the CPU finish one job before starting another.
I don't believe that you are running into a bottleneck at all actually.
Did you try what I suggested ?
Your idea is to add more threads to improve performance, because you are expecting that all of your threads will run perfectly in parallel. This is why you are assuming that doubling the number of threads should not double the total test time.
Your service takes a fraction of a second to return and your threads will not all start working at exactly the same instant in time on the client.
So your threads are not actually working completely in parallel as you have assumed, and the results you are seeing are to be expected.
You are not seeing any performance gain because there is none to be had. The one line of code in your service (below) probably executes without a context switch most of the time anyway.
return new Random().NextDouble();
The overhead involved in the web service call is higher than than the work you are doing inside of it. If you have some substantial work to do inside the service (database calls, look-ups, file access etc) you may begin to see some performance increase.
Just parallelizing a task will not automatically make it faster.
-Jason
Of course adding Sleep will not improve performance.
But the point of the test is to test with a variable number of threads.
So, keep the Sleep in your WebMethod.
And try now with 5, 10, 20 threads.
If there are no other problems with your code, then the increase in time should not be linear as before.
You realize that in your test, when you double the amount of threads, you are doubling the amount of work that is being done. So if your threads are not truly executing in parallel, then you will, of course, see a linear increase in total time...
I ran a simple test using your client code (with a sleep on the service).
For 5 threads, I saw a total time of about 53 seconds.
And for 10 threads, 62 seconds.
So, for 2x the number of calls to the webservice, it only took 17% more time.. That is what you are expecting, no ?
Well, in this case, you're not really balancing your work between the chosen n.º of threads... Each Thread you create will be performing the same Job. So if you create n threads and you have a limited parallel processing capacity, the performance naturally decreases. Another think I notice is that the required Job is a relatively fast operation for 100 iterations and even if you plan on dividing this Job through multiple threads you need to consider that the time spent in context switching, thread creation/deletion will be an important factor in the overall time.
As bruno mentioned, your webmethod is a very quick operation. As an experiment, try ensuring that your HelloWorld method takes a bit longer. Throw in a Thread.Sleep(1000) before you return the random double. This will make it more likely that your service is actually forced to process requests in parallel.
Then try your client with different amounts of threads, and see how the performance differs.
Try to use some processor consuming task instead of Thread.Sleep. Actually combined approach is the best.
Sleep will just pass thread's time frame to another thread.
IIS AppPool "Maximum Worker Processes" is set to 1 by default. For some reason, each worker process is limited to process 10 service calls at a time. My WCF async server-side function does Sleep(10*1000); only.
This is what happens when Maximum Worker Processes = 1
http://s4.postimg.org/4qc26cc65/image.png
alternatively
http://i.imgur.com/C5FPbpQ.png?1
(First post on SO, I need to combine all pictures into one picture.)
The client is making 48 async WCF WS calls in this test (using 16 processes). Ideally this should take ~10 seconds to complete (Sleep(10000)), but it takes 52 seconds. You can see 5 horizontal lines in the perfmon picture (above link) (using perfmon for monitoring Web Service Current Connections in server). Each horizontal line lasts 10 seconds (which Sleep(10000) does). There are 5 horizontal lines because the server processes 10 calls each time then closes that 10 connections (this happens 5 times to process 48 calls). Completion of all calls took 52 seconds.
After setting Maximum Worker Processes = 2
(in the same picture given above)
This time there are 3 horizontal lines because the server processes 20 calls each time then closes that 20 connections (this happens 3 times to process 48 calls). Took 33 secs.
After setting Maximum Worker Processes = 3
(in the same picture given above)
This time there are 2 horizontal lines because the server processes 30 calls each time. (happens 2 times to process 48 calls) Took 24 seconds.
After setting Maximum Worker Processes = 8
(in the same picture given above)
This time there is 1 horizontal line because the server processes 80 calls each time. (happens once to process 48 calls) Took 14 seconds.
If you don't care this situation, your parallel (async or threaded) client calls will be queued by 10s in the server, then all of your threaded calls (>10) won't get processed by the server in parallel.
PS: I was using Windows 8 x64 with IIS 8.5. The 10 concurrent request limit is for workstation Windows OSes. Server OSes doesn't have that limit according to another post on SO (I can't give link due to rep < 10).

Categories