I have a multi-threaded application with (4) thread i want to know how much processing time spent in threads. I've created all these threads with ThreadPool
Thread1 doing job1
Thread2 doing job2
..
..
result would be:
Thread1 was running in 12 millisecond
Thread2 was running in 20 millisecond
I've actually download a web page in a job that each job is processing in one thread i want to know how much time it takes a web page is downloaded (without the affection of other threads context switch in calculated time
I found this code on codeproject:
http://www.codeproject.com/KB/dotnet/ExecutionStopwatch.aspx
Try it and report back ;)
If you want to get the total time you would get from a stopwatch, there's the Stopwatch class:
Stopwatch sw = Stopwatch.StartNew();
// execute code
sw.Stop();
// read and report on sw.ElapsedMilliseconds
If you want to find out how much time the thread was actually executing code (and not waiting for I/O etc.) you can examine the ProcessThread.TotalProcessorTime property, by enumerating the threads of the Process object for your application.
Note that threads in the thread pool are not destroyed after use, but left in the pool for reuse, which means your total time for a thread includes everything it has done before your current workload.
the WMI class Win32_Thared contains properties KernelModeTime and UserModeTime which will, if available, give you a count of 100ns units of actual execution.
But, from the documentation:
If this information is not available, a value of 0 (zero) should be used.
So this might be OS dependent (it is certainly populated here on Win7).
A query like: select * from win32_thread where ProcessHandle="x" will get the Win32_Thread instances for process id x (ignore the "handle" in the name). E.g., using PowerShell, looking at its own threads:
PS[64bit] > gwmi -Query "select * from win32_thread where ProcessHandle=""7064"""|
ft -AutoSize Handle,KernelModeTime,UserModeTime
Handle KernelModeTime UserModeTime
------ -------------- ------------
5548 218 312
6620 0 0
6112 0 0
7148 0 15
6888 0 0
7380 0 0
3992 0 0
8372 0 0
644 0 0
1328 0 15
(And to confirm this is not elapsed time, the process start time is 16:44:50 2010-09-30.
Can not be done. The problem is that unless you block it (hard to do- makes little sense) threads can be inteeruupted. So, while it TAKES THread2 20ms to complete, you do not know how much of that it was active.
The negative side of what is called preemtive multitasking.
Related
I'm investigating the Parallelism Break in a For loop.
After reading this and this I still have a question:
I'd expect this code :
Parallel.For(0, 10, (i,state) =>
{
Console.WriteLine(i); if (i == 5) state.Break();
}
To yield at most 6 numbers (0..6).
not only he is not doing it but have different result length :
02351486
013542
0135642
Very annoying. (where the hell is Break() {after 5} here ??)
So I looked at msdn
Break may be used to communicate to the loop that no other iterations after the current iteration need be run.
If Break is called from the 100th iteration of a for loop iterating in
parallel from 0 to 1000, all iterations less than 100 should still be
run, but the iterations from 101 through to 1000 are not necessary.
Quesion #1 :
Which iterations ? the overall iteration counter ? or per thread ? I'm pretty sure it is per thread. please approve.
Question #2 :
Lets assume we are using Parallel + range partition (due to no cpu cost change between elements) so it divides the data among threads . So if we have 4 cores (and perfect divisions among them):
core #1 got 0..250
core #2 got 251..500
core #3 got 501..750
core #4 got 751..1000
so the thread in core #1 will meet value=100 sometime and will break.
this will be his iteration number 100 .
But the thread in core #4 got more quanta and he is on 900 now. he is way beyond his 100'th iteration.
He doesnt have index less 100 to be stopped !! - so he will show them all.
Am I right ? is that is the reason why I get more than 5 elements in my example ?
Question #3 :
How cn I truly break when (i == 5) ?
p.s.
I mean , come on ! when I do Break() , I want things the loop to stop.
excactly as I do in regular For loop.
To yield at most 6 numbers (0..6).
The problem is that this won't yield at most 6 numbers.
What happens is, when you hit a loop with an index of 5, you send the "break" request. Break() will cause the loop to no longer process any values >5, but process all values <5.
However, any values greater than 5 which were already started will still get processed. Since the various indices are running in parallel, they're no longer ordered, so you get various runs where some values >5 (such as 8 in your example) are still being executed.
Which iterations ? the overall iteration counter ? or per thread ? I'm pretty sure it is per thread. please approve.
This is the index being passed into Parallel.For. Break() won't prevent items from being processed, but provides a guarantee that all items up to 100 get processed, but items above 100 may or may not get processed.
Am I right ? is that is the reason why I get more than 5 elements in my example ?
Yes. If you use a partitioner like you've shown, as soon as you call Break(), items beyond the one where you break will no longer get scheduled. However, items (which is the entire partition) already scheduled will get processed fully. In your example, this means you're likely to always process all 1000 items.
How can I truly break when (i == 5) ?
You are - but when you run in Parallel, things change. What is the actual goal here? If you only want to process the first 6 items (0-5), you should restrict the items before you loop through them via a LINQ query or similar. You can then process the 6 items in Parallel.For or Parallel.ForEach without a Break() and without worry.
I mean , come on ! when I do Break() , I want things the loop to stop. excactly as I do in regular For loop.
You should use Stop() instead of Break() if you want things to stop as quickly as possible. This will not prevent items already running from stopping, but will no longer schedule any items (including ones at lower indices or earlier in the enumeration than your current position).
If Break is called from the 100th iteration of a for loop iterating in parallel from 0 to 1000
The 100th iteration of the loop is not necessarily (in fact probably not) the one with the index 99.
Your threads can and will run in an indeterminent order. When the .Break() instruction is encountered, no further loop iterations will be started. Exactly when that happens depends on the specifics of thread scheduling for a particular run.
I strongly recommend reading
Patterns of Parallel Programming
(free PDF from Microsoft)
to understand the design decisions and design tradeoffs that went into the TPL.
Which iterations ? the overall iteration counter ? or per thread ?
Off all the iterations scheduled (or yet to be scheduled).
Remember the delegate may be run out of order, there is no guarantee that iteration i == 5 will be the sixth to execute, rather this is unlikely to be the case except in rare cases.
Q2: Am I right ?
No, the scheduling is not so simplistic. Rather all the tasks are queued up and then the queue is processed. But the threads each use their own queue until it is empty when they steal from other the threads. This leads no way to predict which thread will process what delegate.
If the delegates are sufficiently trivial it might all be processed on the original calling thread (no other thread gets a chance to steal work).
Q3: How cn I truly break when (i == 5) ?
Don't use concurrently if you want linear (in specific) processing.
The Break method is there to support speculative execution: try various ways and stop as soon as any one completes.
I'm working on a console application which will be scheduled and run at set intervals, say every 30 minutes. Its only purpose is to query a Web Service to update a batch of database rows.
The Web Service API reccommends calling once every 30 seconds, and timeout after a set interval. The following pseudocode is given as an example:
listId := updateList(<list of terms>)
LOOP
WHILE NOT isUpdatingComplete(listId)
END LOOP
statuses := getStatuses(“LIST_ID = {listId}”)
I have coded this roughly in C# as:
int callCount = 0;
while( callCount < 5 && !client.isUpdateComplete(listId, out messages) )
{
listId = client.updateList(options, terms, out messages);
callCount++;
Thread.Sleep(30000);
}
// Get resulting status...
Is it OK in this situation to use Thread.Sleep()? I'm aware it is not generally good practice but from reading reasons not to use it this seems like acceptable usage.
Thanks.
Thread.Sleep ensures the current thread doesn't return until at least the specified milliseconds have passed. There are plenty of places it's appropriate to do that, and your example seems fine, assuming it's running on a background thread.
Some example places you don't want to use it - on the UI thread or where you need to do exact timing.
Generally speaking, Thread.Sleep is like any other tool: perfectly OK to use, except when it's terribly misused. I disagree with the "not generally good practice" part, which is the result of people abusing Thread.Sleep when they should be doing something else (i.e. blocking on a synchronization object).
In your case the program is single-threaded, it has no UI (i.e. the thread has no message loop) and you do not want to synchronize with external events. Therefore Thread.Sleep is just fine.
The general objection against Sleep() is that it wastes a Thread.
In your case there is only 1 Thread (maybe 2) so that is not really a problem.
So I think it looks fine (but I would sleep 29 seconds to cut some slack).
It's fine, except that you cannot interrupt it once it goes into sleep, without aborting the thread (which is not recommended).
That's why a ManualResetEvent might be a better idea, since it can be signalled ("awaken") from a different thread.
you could stick with the Thread.Sleep method. But it would be more elegant to schedule it to run every 30 minutes - so you don't have to take care of the waiting inside your application.
Thread.Sleep isn't the best for executing periodic logic. Thread.Sleep(n) means your thread will relinquish control for n milliseconds. There is no guarantee that it will regain control after n milliseconds, it depends on the CPU load.
If you are locking the thread for 30 mins case you should schedule a windows task every 30 mins, so the program executes and then ends. That way you are not locking a thread for so long.
For shorter times, like 30 secs / 1 min, System.Thread.Sleep() is perfectly fine. For more than 5 mins i would use a windows task. (Im spanish i think on the english version are called like that, im talking about the tasks you schedule from the control panel ;-) )
I am making multiple requests to a website. How can I introduce a delay between requests to slow down my process? Is there a method that allows me to just cause the thread to wait for X seconds before proceeding?
Are you looking for Thread.Sleep?
Note that it causes the current thread to sleep - you don't target it at a different thread. But if you've got a loop within one thread, making multiple requests, you could easily add it in to the loop to restrict your request rate.
The Sleep method is overloaded - one signature takes a TimeSpan and the other takes a number of milliseconds. Personally I'd generally prefer the first one, as it leaves no room for ambiguity. For example:
Thread.Sleep(TimeSpan.FromSeconds(2));
is obviously asking the thread to sleep for 2 seconds - not 2 minutes, 2 milliseconds etc.
System.Threading.Thread.Sleep(x);
…where x is the number of milliseconds to sleep the thread.
An alternative is to use a Timer and do a request each time the Tick event fires.
I am reading http://www.mono-project.com/ThreadsBeginnersGuide.
The first example looks like this:
public class FirstUnsyncThreads {
private int i = 0;
public static void Main (string[] args) {
FirstUnsyncThreads myThreads = new FirstUnsyncThreads ();
}
public FirstUnsyncThreads () {
// Creating our two threads. The ThreadStart delegate is points to
// the method being run in a new thread.
Thread firstRunner = new Thread (new ThreadStart (this.firstRun));
Thread secondRunner = new Thread (new ThreadStart (this.secondRun));
// Starting our two threads. Thread.Sleep(10) gives the first Thread
// 10 miliseconds more time.
firstRunner.Start ();
Thread.Sleep (10);
secondRunner.Start ();
}
// This method is being excecuted on the first thread.
public void firstRun () {
while(this.i < 10) {
Console.WriteLine ("First runner incrementing i from " + this.i +
" to " + ++this.i);
// This avoids that the first runner does all the work before
// the second one has even started. (Happens on high performance
// machines sometimes.)
Thread.Sleep (100);
}
}
// This method is being excecuted on the second thread.
public void secondRun () {
while(this.i < 10) {
Console.WriteLine ("Second runner incrementing i from " + this.i +
" to " + ++this.i);
Thread.Sleep (100);
}
}
}
Output:
First runner incrementing i from 0 to 1
Second runner incrementing i from 1 to 2
Second runner incrementing i from 3 to 4
First runner incrementing i from 2 to 3
Second runner incrementing i from 5 to 6
First runner incrementing i from 4 to 5
First runner incrementing i from 6 to 7
Second runner incrementing i from 7 to 8
Second runner incrementing i from 9 to 10
First runner incrementing i from 8 to 9
Wow, what is this? Unfortunately, the explanation in the article is inadequate for me. Can you explain me why the increments happened in a jumbled order?
Thanks!
I think the writer of the article has confused things.
VoteyDisciple is correct that ++i is not atomic and a race condition can occur if the target is not locked during the operation but this will not cause the issue described above.
If a race condition occurs calling ++i then internal operations of the ++ operator will look something like:-
1st thread reads value 0
2nd thread reads value 0
1st thread increments value to 1
2nd thread increments value to 1
1st thread writes value 1
2nd thread writes value 1
The order of operations 3 to 6 is unimportant, the point is that both the read operations, 1 and 2, can occur when the variable has value x resulting in the same incrementation to y, rather than each thread performing incrementations for distinct values of x and y.
This may result in the following output:-
First runner incrementing i from 0 to 1
Second runner incrementing i from 0 to 1
What would be even worse is the following:-
1st thread reads value 0
2nd thread reads value 0
2nd thread increments value to 1
2nd thread writes value 1
2nd thread reads value 1
2nd thread increments value to 2
2nd thread writes value 2
1st thread increments value to 1
1st thread writes value 1
2nd thread reads value 1
2nd thread increments value to 2
2nd thread writes value 2
This may result in the following output:-
First runner incrementing i from 0 to 1
Second runner incrementing i from 0 to 1
Second runner incrementing i from 1 to 2
Second runner incrementing i from 1 to 2
And so on.
Furthermore, there is a possible race condition between reading i and performing ++i since the Console.WriteLine call concatenates i and ++i. This may result in output like:-
First runner incrementing i from 0 to 1
Second runner incrementing i from 1 to 3
First runner incrementing i from 1 to 2
The jumbled console output which the writer has described can only result from the unpredictability of the console output and has nothing to do with a race condition on the i variable. Taking a lock on i whilst performing ++i or whilst concatenating i and ++i will not change this behaviour.
When I run this (on a dualcore), my output is
First runner incrementing i from 0 to 1
Second runner incrementing i from 1 to 2
First runner incrementing i from 2 to 3
Second runner incrementing i from 3 to 4
First runner incrementing i from 4 to 5
Second runner incrementing i from 5 to 6
First runner incrementing i from 6 to 7
Second runner incrementing i from 7 to 8
First runner incrementing i from 8 to 9
Second runner incrementing i from 9 to 10
As I would have expected. You are running two loops, both executing Sleep(100). That is very ill suited to demonstrate a race-condition.
The code does have a race condition (as VoteyDisciple describes) but it is very unlikely to surface.
I can't explain the lack of order in your output (is it a real output?), but the Console class will synchronize output calls.
If you leave out the Sleep() calls and run the loops 1000 times (instead of 10) you might see two runners both incrementing from 554 to 555 or something.
Synchronization is essential when multiple threads are present. In this case you are seeing that both threads read and write to this.i , but no good attempt is done at synchronize these accesses. Since both of them concurrently modify the same memory area, you observe the jumbled output.
The call to Sleep is dangerous, it is an approach which leads to sure bugs. You cannot assume that the threads will be always displaced by the inital 10 ms.
In short: Never use Sleep for synchronization :-) but instead adopt some kind of thread synchronization technique (eg. locks, mutexes, semaphores). Always try to use the lightest possible lock that will fulfill your need....
A useful resource is the book by Joe Duffy, Concurrent Programming on Windows.
The increments are not happening out of order, the Console.WriteLine(...) is writing the output from multiple threads into a single-threaded console, and the synchronization from many threads to one thread is causing the messages to appear out of order.
I assume this example attempted to create a race condition, and in your case failed. Unfortunately, concurrency issues, such as a race condition and deadlocks, are hard to predict and reproduce due to their nature. You might want to try and run it a few more times, alter it to use more threads and each thread should increment more times (say 100,000). Then you might see that the end result will not equal the sum of all the increments (caused by a race condition).
Here is my sample program for web service server side and client side. I met with a strnage performance problem, which is, even if I increase the number of threads to call web services, the performance is not improved. At the same time, the CPU/memory/network consumption from performance panel of task manager is low. I am wondering what is the bottleneck and how to improve it?
(My test experience, double the number of threads will almost double the total response time)
Client side:
class Program
{
static Service1[] clients = null;
static Thread[] threads = null;
static void ThreadJob (object index)
{
// query 1000 times
for (int i = 0; i < 100; i++)
{
clients[(int)index].HelloWorld();
}
}
static void Main(string[] args)
{
Console.WriteLine("Specify number of threads: ");
int number = Int32.Parse(Console.ReadLine());
clients = new Service1[number];
threads = new Thread[number];
for (int i = 0; i < number; i++)
{
clients [i] = new Service1();
ParameterizedThreadStart starter = new ParameterizedThreadStart(ThreadJob);
threads[i] = new Thread(starter);
}
DateTime begin = DateTime.Now;
for (int i = 0; i < number; i++)
{
threads[i].Start(i);
}
for (int i = 0; i < number; i++)
{
threads[i].Join();
}
Console.WriteLine("Total elapsed time (s): " + (DateTime.Now - begin).TotalSeconds);
return;
}
}
Server side:
[WebMethod]
public double HelloWorld()
{
return new Random().NextDouble();
}
thanks in advance,
George
Although you are creating a multithreaded client, bear in mind that .NET has a configurable bottleneck of 2 simultaneous calls to a single host. This is by design.
Note that this is on the client, not the server.
Try adjusting your app.config file in the client:
<system.net>
<connectionManagement>
<add address=“*” maxconnection=“20″ />
</connectionManagement></system.net>
There is some more info on this in this short article :
My experience is generally that locking is the problem: I had a massively parallel server once that spent more time context switching than it did performing work.
So - check your memory and process counters in perfmon, if you look at context switches and its high (more than 4000 per second) then you're in trouble.
You can also check your memory stats on the server too - if its spending all its time swapping, or just creating and freeing strings, it'll appear to stall also.
Lastly, check disk I/O, same reason as above.
The resolution is to remove your locks, or hold them for a minimum of time. Our problem was solved by removing the dependence on COM BSTRs and their global lock, you'll find that C# has plenty of similar synchronisation bottlenecks (intended to keep your code working safely). I've seen performance drop when I moved a simple C# app from a single-core to a multi-core box.
If you cannot remove the locks, the best option is not to create as many threads :) Use a thread pool instead to let the CPU finish one job before starting another.
I don't believe that you are running into a bottleneck at all actually.
Did you try what I suggested ?
Your idea is to add more threads to improve performance, because you are expecting that all of your threads will run perfectly in parallel. This is why you are assuming that doubling the number of threads should not double the total test time.
Your service takes a fraction of a second to return and your threads will not all start working at exactly the same instant in time on the client.
So your threads are not actually working completely in parallel as you have assumed, and the results you are seeing are to be expected.
You are not seeing any performance gain because there is none to be had. The one line of code in your service (below) probably executes without a context switch most of the time anyway.
return new Random().NextDouble();
The overhead involved in the web service call is higher than than the work you are doing inside of it. If you have some substantial work to do inside the service (database calls, look-ups, file access etc) you may begin to see some performance increase.
Just parallelizing a task will not automatically make it faster.
-Jason
Of course adding Sleep will not improve performance.
But the point of the test is to test with a variable number of threads.
So, keep the Sleep in your WebMethod.
And try now with 5, 10, 20 threads.
If there are no other problems with your code, then the increase in time should not be linear as before.
You realize that in your test, when you double the amount of threads, you are doubling the amount of work that is being done. So if your threads are not truly executing in parallel, then you will, of course, see a linear increase in total time...
I ran a simple test using your client code (with a sleep on the service).
For 5 threads, I saw a total time of about 53 seconds.
And for 10 threads, 62 seconds.
So, for 2x the number of calls to the webservice, it only took 17% more time.. That is what you are expecting, no ?
Well, in this case, you're not really balancing your work between the chosen n.º of threads... Each Thread you create will be performing the same Job. So if you create n threads and you have a limited parallel processing capacity, the performance naturally decreases. Another think I notice is that the required Job is a relatively fast operation for 100 iterations and even if you plan on dividing this Job through multiple threads you need to consider that the time spent in context switching, thread creation/deletion will be an important factor in the overall time.
As bruno mentioned, your webmethod is a very quick operation. As an experiment, try ensuring that your HelloWorld method takes a bit longer. Throw in a Thread.Sleep(1000) before you return the random double. This will make it more likely that your service is actually forced to process requests in parallel.
Then try your client with different amounts of threads, and see how the performance differs.
Try to use some processor consuming task instead of Thread.Sleep. Actually combined approach is the best.
Sleep will just pass thread's time frame to another thread.
IIS AppPool "Maximum Worker Processes" is set to 1 by default. For some reason, each worker process is limited to process 10 service calls at a time. My WCF async server-side function does Sleep(10*1000); only.
This is what happens when Maximum Worker Processes = 1
http://s4.postimg.org/4qc26cc65/image.png
alternatively
http://i.imgur.com/C5FPbpQ.png?1
(First post on SO, I need to combine all pictures into one picture.)
The client is making 48 async WCF WS calls in this test (using 16 processes). Ideally this should take ~10 seconds to complete (Sleep(10000)), but it takes 52 seconds. You can see 5 horizontal lines in the perfmon picture (above link) (using perfmon for monitoring Web Service Current Connections in server). Each horizontal line lasts 10 seconds (which Sleep(10000) does). There are 5 horizontal lines because the server processes 10 calls each time then closes that 10 connections (this happens 5 times to process 48 calls). Completion of all calls took 52 seconds.
After setting Maximum Worker Processes = 2
(in the same picture given above)
This time there are 3 horizontal lines because the server processes 20 calls each time then closes that 20 connections (this happens 3 times to process 48 calls). Took 33 secs.
After setting Maximum Worker Processes = 3
(in the same picture given above)
This time there are 2 horizontal lines because the server processes 30 calls each time. (happens 2 times to process 48 calls) Took 24 seconds.
After setting Maximum Worker Processes = 8
(in the same picture given above)
This time there is 1 horizontal line because the server processes 80 calls each time. (happens once to process 48 calls) Took 14 seconds.
If you don't care this situation, your parallel (async or threaded) client calls will be queued by 10s in the server, then all of your threaded calls (>10) won't get processed by the server in parallel.
PS: I was using Windows 8 x64 with IIS 8.5. The 10 concurrent request limit is for workstation Windows OSes. Server OSes doesn't have that limit according to another post on SO (I can't give link due to rep < 10).