multithread performance problem for web service call

multithread performance problem for web service call - c#

Here is my sample program for web service server side and client side. I met with a strnage performance problem, which is, even if I increase the number of threads to call web services, the performance is not improved. At the same time, the CPU/memory/network consumption from performance panel of task manager is low. I am wondering what is the bottleneck and how to improve it?
(My test experience, double the number of threads will almost double the total response time)
Client side:
class Program
{
static Service1[] clients = null;
static Thread[] threads = null;
static void ThreadJob (object index)
{
// query 1000 times
for (int i = 0; i < 100; i++)
{
clients[(int)index].HelloWorld();
}
}
static void Main(string[] args)
{
Console.WriteLine("Specify number of threads: ");
int number = Int32.Parse(Console.ReadLine());
clients = new Service1[number];
threads = new Thread[number];
for (int i = 0; i < number; i++)
{
clients [i] = new Service1();
ParameterizedThreadStart starter = new ParameterizedThreadStart(ThreadJob);
threads[i] = new Thread(starter);
}
DateTime begin = DateTime.Now;
for (int i = 0; i < number; i++)
{
threads[i].Start(i);
}
for (int i = 0; i < number; i++)
{
threads[i].Join();
}
Console.WriteLine("Total elapsed time (s): " + (DateTime.Now - begin).TotalSeconds);
return;
}
}
Server side:
[WebMethod]
public double HelloWorld()
{
return new Random().NextDouble();
}
thanks in advance,
George

Although you are creating a multithreaded client, bear in mind that .NET has a configurable bottleneck of 2 simultaneous calls to a single host. This is by design.
Note that this is on the client, not the server.
Try adjusting your app.config file in the client:
<system.net>
<connectionManagement>
<add address=“*” maxconnection=“20″ />
</connectionManagement></system.net>
There is some more info on this in this short article :

My experience is generally that locking is the problem: I had a massively parallel server once that spent more time context switching than it did performing work.
So - check your memory and process counters in perfmon, if you look at context switches and its high (more than 4000 per second) then you're in trouble.
You can also check your memory stats on the server too - if its spending all its time swapping, or just creating and freeing strings, it'll appear to stall also.
Lastly, check disk I/O, same reason as above.
The resolution is to remove your locks, or hold them for a minimum of time. Our problem was solved by removing the dependence on COM BSTRs and their global lock, you'll find that C# has plenty of similar synchronisation bottlenecks (intended to keep your code working safely). I've seen performance drop when I moved a simple C# app from a single-core to a multi-core box.
If you cannot remove the locks, the best option is not to create as many threads :) Use a thread pool instead to let the CPU finish one job before starting another.

I don't believe that you are running into a bottleneck at all actually.
Did you try what I suggested ?
Your idea is to add more threads to improve performance, because you are expecting that all of your threads will run perfectly in parallel. This is why you are assuming that doubling the number of threads should not double the total test time.
Your service takes a fraction of a second to return and your threads will not all start working at exactly the same instant in time on the client.
So your threads are not actually working completely in parallel as you have assumed, and the results you are seeing are to be expected.

You are not seeing any performance gain because there is none to be had. The one line of code in your service (below) probably executes without a context switch most of the time anyway.
return new Random().NextDouble();
The overhead involved in the web service call is higher than than the work you are doing inside of it. If you have some substantial work to do inside the service (database calls, look-ups, file access etc) you may begin to see some performance increase.
Just parallelizing a task will not automatically make it faster.
-Jason

Of course adding Sleep will not improve performance.
But the point of the test is to test with a variable number of threads.
So, keep the Sleep in your WebMethod.
And try now with 5, 10, 20 threads.
If there are no other problems with your code, then the increase in time should not be linear as before.
You realize that in your test, when you double the amount of threads, you are doubling the amount of work that is being done. So if your threads are not truly executing in parallel, then you will, of course, see a linear increase in total time...
I ran a simple test using your client code (with a sleep on the service).
For 5 threads, I saw a total time of about 53 seconds.
And for 10 threads, 62 seconds.
So, for 2x the number of calls to the webservice, it only took 17% more time.. That is what you are expecting, no ?

Well, in this case, you're not really balancing your work between the chosen n.º of threads... Each Thread you create will be performing the same Job. So if you create n threads and you have a limited parallel processing capacity, the performance naturally decreases. Another think I notice is that the required Job is a relatively fast operation for 100 iterations and even if you plan on dividing this Job through multiple threads you need to consider that the time spent in context switching, thread creation/deletion will be an important factor in the overall time.

As bruno mentioned, your webmethod is a very quick operation. As an experiment, try ensuring that your HelloWorld method takes a bit longer. Throw in a Thread.Sleep(1000) before you return the random double. This will make it more likely that your service is actually forced to process requests in parallel.
Then try your client with different amounts of threads, and see how the performance differs.

Try to use some processor consuming task instead of Thread.Sleep. Actually combined approach is the best.
Sleep will just pass thread's time frame to another thread.

IIS AppPool "Maximum Worker Processes" is set to 1 by default. For some reason, each worker process is limited to process 10 service calls at a time. My WCF async server-side function does Sleep(10*1000); only.
This is what happens when Maximum Worker Processes = 1
http://s4.postimg.org/4qc26cc65/image.png
alternatively
http://i.imgur.com/C5FPbpQ.png?1
(First post on SO, I need to combine all pictures into one picture.)
The client is making 48 async WCF WS calls in this test (using 16 processes). Ideally this should take ~10 seconds to complete (Sleep(10000)), but it takes 52 seconds. You can see 5 horizontal lines in the perfmon picture (above link) (using perfmon for monitoring Web Service Current Connections in server). Each horizontal line lasts 10 seconds (which Sleep(10000) does). There are 5 horizontal lines because the server processes 10 calls each time then closes that 10 connections (this happens 5 times to process 48 calls). Completion of all calls took 52 seconds.
After setting Maximum Worker Processes = 2
(in the same picture given above)
This time there are 3 horizontal lines because the server processes 20 calls each time then closes that 20 connections (this happens 3 times to process 48 calls). Took 33 secs.
After setting Maximum Worker Processes = 3
(in the same picture given above)
This time there are 2 horizontal lines because the server processes 30 calls each time. (happens 2 times to process 48 calls) Took 24 seconds.
After setting Maximum Worker Processes = 8
(in the same picture given above)
This time there is 1 horizontal line because the server processes 80 calls each time. (happens once to process 48 calls) Took 14 seconds.
If you don't care this situation, your parallel (async or threaded) client calls will be queued by 10s in the server, then all of your threaded calls (>10) won't get processed by the server in parallel.
PS: I was using Windows 8 x64 with IIS 8.5. The 10 concurrent request limit is for workstation Windows OSes. Server OSes doesn't have that limit according to another post on SO (I can't give link due to rep < 10).

Related

High cpu when using PerformanceCounter.NextValue

We have created a monitoring application for our enterprise app that will monitor our applications Performance counters. We monitor a couple system counters (memory, cpu) and 10 or so of our own custom performance counters. We have 7 or 8 exes that we monitor, so we check 80 counters every couple seconds.
Everything works great except when we loop over the counters the cpu takes a hit, 15% or so on my pretty good machine but on other machines we have seen it much higher. We are wanting our monitoring app to run discretely in the background looking for issues, not eating up a significant amount of the cpu.
This can easily be reproduced by this simple c# class. This loads all processes and gets Private Bytes for each. My machine has 150 processes. CallNextValue Takes 1.4 seconds or so and 16% cpu
class test
{
List<PerformanceCounter> m_counters = new List<PerformanceCounter>();
public void Load()
{
var processes = System.Diagnostics.Process.GetProcesses();
foreach (var p in processes)
{
var Counter = new PerformanceCounter();
Counter.CategoryName = "Process";
Counter.CounterName = "Private Bytes";
Counter.InstanceName = p.ProcessName;
m_counters.Add(Counter);
}
}
private void CallNextValue()
{
foreach (var c in m_counters)
{
var x = c.NextValue();
}
}
}
Doing this same thing in Perfmon.exe in windows and adding the counter Process - Private Bytes with all processes selected I see virtually NO cpu taken up and it's also graphing all processes.
So how is Perfmon getting the values? Is there a better/different way to get these performance counters in c#?
I've tried using RawValue instead of NextValue and i don't see any difference.
I've played around with Pdh call in c++ (PdhOpenQuery, PdhCollectQueryData, ...). My first tests don't seem like these are any easier on the cpu but i haven't created a good sample yet.

I'm not very familiar with the .NET performance counter API, but I have a guess about the issue.
The Windows kernel doesn't actually have an API to get detailed information about just one process. Instead, it has an API that can be called to "get all the information about all the processes". It's a fairly expensive API call. Every time you do c.NextValue() for one of your counters, the system makes that API call, throws away 99% of the data, and returns the data about the single process you asked about.
PerfMon.exe uses the same PDH APIs, but it uses a wildcard query -- it creates a single query that gets data for all of the processes at once, so it essentially only calls c.NextValue() once every second instead of calling it N times (where N is the number of processes). It gets a huge chunk of data back (data for all of the processes), but it's relatively cheap to scan through that data.
I'm not sure that the .NET performance counter API supports wildcard queries. The PDH API does, and it would be much cheaper to perform one wildcard query than to perform a whole bunch of single-instance queries.

Sorry for a long response, but I've found your question only now. Anyway, if anyone will need additional help, I have a solution:
I've made a little research on my custom process and I've understood that when we have a code snippet like
PerformanceCounter ourPC = new PerformanceCounter("Process", "% Processor time", "processname", true);
ourPC.NextValue();
Then our performance counter's NextValue() will show you the (number of logical cores * task manager cpu load of the process) value which is kind of logical thing, I suppose.
So, your problem may be that you have a slight CPU load in the task manager because it understands that you have a multiple core CPU, although the performance counter counts it by the formula above.
I see a one (kind of crutchy) possible solution for your problem so your code should be rewritten like this:
private void CallNextValue()
{
foreach (var c in m_counters)
{
var x = c.NextValue() / Environment.ProcessorCount;
}
}
Anyway, I do not recommend you to use Environment.ProcessorCount although I've used it: I just didn't want to add too much code to my short snippet.
You can see a good way to find out how much logical cores (yeah, if you have core i7, for example, you'll have to count logical cores, not physical) do you have in a system if you'll follow this link:
How to find the Number of CPU Cores via .NET/C#?
Good luck!

How capturing the accurate execution time in c#

I try to capture the exact execution time of function
Stopwatch regularSW = new Stopwatch();
for (int i = 0; i < 10; i++) {
regularSW.Start();
//function();
regularSW.Stop();
Console.WriteLine("Measured time: " + regularSW.Elapsed);
}
I also tried with DateTime and Process.GetCurrentProcess().TotalProcessorTime
but each time I get a different value.
How i can get same value ?

With StopWatch you already use the most accurate way. But you are not re-starting it in the loop. It always starts at the value where it ended. You either have to create a new StopWatch or call StopWatch.Restart instead of Start:
Stopwatch regularSW = new Stopwatch();
for (int i = 0; i < 10; i++) {
regularSW.Restart();
//function();
regularSW.Stop();
Console.WriteLine("Measured time: " + regularSW.Elapsed);
}
That's the reason for the different values. If you now still get different values, then the reason is that the method function really has different execution times which is not that unlikely(f.e. if it's a database query).
Since this question seems to be largely theoretical(regarding your comments), consider following things if you want to measure time in .NET:
compile and run in release mode, Any CPU (on an x64 machine) and optimizations on
A tick is 0.0001 milliseconds, so don't overestimate your results
They are different because you cannot control what other operations your system might need to perform in the background while your C# progam is running
If you for example claim memory in the method because you fill a local list, then the garbage collector might attempt to reclaim garbage(memory)
C# code is compiled Just In Time. The first time you go through a loop can therefore be hundreds or thousands of times more expensive than every subsequent time due to the cost of the jitter analyzing the code that the loop calls. If you are intending on measuring the "warm" cost of a loop then you need to run the loop once before you start timing it. If you are intending on measuring the average cost including the jit time then you need to decide how many times makes up a reasonable number of trials, so that the average works out correctly
you are running your code in a multithreaded, multiprocessor environment where threads can be switched at will, and where the thread quantum (the amount of time the operating system will give another thread until yours might get a chance to run again) is about 16 milliseconds. 16 milliseconds is about fifty million processor cycles. Coming up with accurate timings of sub-millisecond operations can be quite difficult if the thread switch happens within one of the several million processor cycles that you are trying to measure. Take that into consideration.
The last two points were copied from this answer of Eric Lippert (worth reading).

QueueUserWorkItem() performance issues using Mono C#

I'm getting intermittent spikes in my profiler around the call to QueueUserWorkItem(). I queue roughly 20 jobs every second. Each job is roughly identical in terms of complexity. 95% of the jobs spend less than 0.01ms in the call to QueueUserWorkItem. But every few seconds, seemingly at random, a job will take from 20-60ms!
I'm trying to build a smooth simulation and just queuing my background tasks is causes serious hitches in my framerate.
This is not the time it takes for the job to actually FINISH, this is time spent on simply QUEUEING the job. Which I would expect should take virtually no time at all, so this is very frustrating.
foreach ( Job j in jobs_to_process )
{
Job current_job = j;
if ( !current_job.queued )
{
current_job.queued = true;
Profiler.BeginSample("QueueWorkItem");
ThreadPool.UnsafeQueueUserWorkItem( current_job.DoCalculation, null );
Profiler.EndSample();
}
}
//now I remove queued items from jobs_to_process, move them to jobs_in_progress list
Things I've tried:
Queuing fewer jobs per second only causes proportionately fewer hitches, but it still hitches.
Putting lock(obj) around all access to job.queued variable
Using SmartThreadPool instead of defaultThreadPool. Problem still
persists inside STP's QueueWorkItem()

Try with the --server flag when running mono (only compatible with mono master, as it was committed recently).

Ten times Thread.Sleep(100) or a single Thread.Sleep(1000)?

Is there a difference (from performance perspective) between:
Thread.Sleep(10000);
and
for(int i=0; i<100; i++)
{
Thread.Sleep(100);
}
Does the single call to Thread.Sleep(10000) also results in context switches within this 10 seconds (so the OS can see if it's done sleeping), or is this thread really not served for 10 seconds?

The second code (for loop) requires more process swaps and should be little slower than Thread.Sleep(10000);
Anyway you can use System.Diagnostics.Stopwatch class to determine the exact time for these two approaches. I believe the difference will be very very small.

in any case second loop will take time because of following overheads
Memory utilization for 10 different thread objects
10 different callbacks will be initiated once you call thread.sleep
Overhead cost for running loop
if we want to run the code on single thread so why do we want a loop if we don't have any break point even.

Acceptable use of Thread.Sleep()

I'm working on a console application which will be scheduled and run at set intervals, say every 30 minutes. Its only purpose is to query a Web Service to update a batch of database rows.
The Web Service API reccommends calling once every 30 seconds, and timeout after a set interval. The following pseudocode is given as an example:
listId := updateList(<list of terms>)
LOOP
WHILE NOT isUpdatingComplete(listId)
END LOOP
statuses := getStatuses(“LIST_ID = {listId}”)
I have coded this roughly in C# as:
int callCount = 0;
while( callCount < 5 && !client.isUpdateComplete(listId, out messages) )
{
listId = client.updateList(options, terms, out messages);
callCount++;
Thread.Sleep(30000);
}
// Get resulting status...
Is it OK in this situation to use Thread.Sleep()? I'm aware it is not generally good practice but from reading reasons not to use it this seems like acceptable usage.
Thanks.

Thread.Sleep ensures the current thread doesn't return until at least the specified milliseconds have passed. There are plenty of places it's appropriate to do that, and your example seems fine, assuming it's running on a background thread.
Some example places you don't want to use it - on the UI thread or where you need to do exact timing.

Generally speaking, Thread.Sleep is like any other tool: perfectly OK to use, except when it's terribly misused. I disagree with the "not generally good practice" part, which is the result of people abusing Thread.Sleep when they should be doing something else (i.e. blocking on a synchronization object).
In your case the program is single-threaded, it has no UI (i.e. the thread has no message loop) and you do not want to synchronize with external events. Therefore Thread.Sleep is just fine.

The general objection against Sleep() is that it wastes a Thread.
In your case there is only 1 Thread (maybe 2) so that is not really a problem.
So I think it looks fine (but I would sleep 29 seconds to cut some slack).

It's fine, except that you cannot interrupt it once it goes into sleep, without aborting the thread (which is not recommended).
That's why a ManualResetEvent might be a better idea, since it can be signalled ("awaken") from a different thread.

you could stick with the Thread.Sleep method. But it would be more elegant to schedule it to run every 30 minutes - so you don't have to take care of the waiting inside your application.

Thread.Sleep isn't the best for executing periodic logic. Thread.Sleep(n) means your thread will relinquish control for n milliseconds. There is no guarantee that it will regain control after n milliseconds, it depends on the CPU load.

If you are locking the thread for 30 mins case you should schedule a windows task every 30 mins, so the program executes and then ends. That way you are not locking a thread for so long.
For shorter times, like 30 secs / 1 min, System.Thread.Sleep() is perfectly fine. For more than 5 mins i would use a windows task. (Im spanish i think on the english version are called like that, im talking about the tasks you schedule from the control panel ;-) )

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.