lastly i noticed a huge cpu perfomance issue while high of Http Request calling my web App.
so i tried to use Perfomance Profiler.
The Visual Studio performance profiler shows a high amount of CPU usage in the ThreadPoolWorkQueue.Dispatch and DomainBoundILStubClass.IL_STUB_COMtoCLR methods for an ASP.NET Entity Framework application(). The CPU usage seems to increase with the number of incoming HTTP requests. I'm concerned this could be thread pool managment.
I've reviewd the code and i found no place where it week or bad implemented but still could not find solution for it
is there any suggestuion where i can start to improve the perfomance ?
In Java webapps, I often do CPU sampling through jvisualvm to distinguish database bottlenecks from other processing in the application. Long-running database queries will show up as time spent within the JDBC driver's classes and methods.
It looks like ASP.NET tooling and database drivers don't work the same way. If I profile with CPU sampling, I don't see any of the time spent in methods while waiting for I/O. If I want to compare database and application bottlenecks side-by-side, I need to use instrumentation.
In other words, CPU samples are not an approximation -- they are fundamentally a different metric.
Is this because of a difference in how the CPU sampling process works in Java vs. the CLR, or is it a difference in how the database drivers work? Or a difference in how the CLR vs. JVM treat time spent waiting for I/O?
I have a ASP.Net project and many reports.Some of my reports have heavy calculation that I calculate them in memory using Linq. When I test this reports on my client CPU usage is about 25%.
My question is why cpu usage does not increase to 80% or more?
When I publish this project on the server does it has this behaviour?
You have 4 cores (or 2 hyper-threader cores), meaning each single thread can take up to 25% of the total computing power (which is shown as 25% CPU in the Task Manager).
Your calculation is probably single threaded.
Can you possibly break your calculation into several threads? That'll spread the load across the cores of your CPU a little more evenly.
I'm running Aforge.net's provided two camera test samples on my Dual Core 2.0 GHz laptop with 2 GB RAM. Right now I'm seeing a lot of CPU usage as the application starts displaying visuals from 2 cameras. It's currently consuming 60% to 70% of the entire CPU power. Can anyone tell me why it's consuming that much CPU and how can I avoid it as I have to build a similar application which would require 2 camera vision and I would be using C#.
Could be a lot of things. Turn on a profiler and find out where the time is spent. Then adjust your question!
I'm facing a really strange problem with a .Net service.
I developed a multithreaded x64 windows service.
I tested this service in a x64 server with 8 cores. The performance was great!
Now I moved the service to a production server (x64 - 32 cores). During the tests I found out the performance is, at least, 10 times worst than in the test server.
I've checked loads of performance counters trying to find some reason for this poor performance, but I couldn't find a point.
Could be a GC problem? Have you ever faced a problem like this?
Thank you in advance!
Alexandre
This is a common problem which people are generally unaware of, because very few people have experience on many-CPU machines.
The basic problem is contention.
As the CPU count increases, contention increases in all shared data structures. For low CPU counts, contention is low and the fact you have multiple CPUs improves performance. As the CPU count becomes significantly larger, contention begins to drown out your performance improvements; as the CPU count becomes large, contention actually starts reducing performance below that of a lower number of CPUs.
You are basically facing one of the aspects of the scalability problem.
I'm not sure however where this problem lies; in your data structures, or in the operating systems data structures. The former you can address - lock-free data structures are an excellent, highly scalable approach. The latter is difficult, since it essentially requires avoiding certain OS functionality.
There are way too many variables to know why one machine is slower than the other. 32 core machines are usually more specialized where an eight core could just be a dual proc quad core machine. Are there vm's or other things running at the same time? Usually with that many cores, IO bandwidth becomes the limiting factor (even if the cpu's still have plenty of bandwidth).
To start off, you should probably add lots of timers in your code (or profiling or whatever) to figure out what part of your code is taking up the most time.
Performance troublshooting 101: what is the bottleneck ( where in the code and what subsystem (memory, disk, cpu) )
There are so many factors here:
are you actually using the cores?
are your extra threads causing locking issues to be more obvious?
do you not have enough memory to support all the extra stacks / data you can process?
can your IO (disk/network/database) stack keep up with the throughput?
etc
Could it be down to differences in memory or the disk? If there were the bottleneck, you'd not get the value for the additional processing power. Can't really tell without more details of your application/configuration.
With that many threads running concurrently, you're going to have to be really careful to get around issues of threads fighting with each other to access your data. Read up on Non-blocking synchronization.
How many threads are you using? Using to many thread pool threads could cause thread starvation which would make your program slower.
Some articles:
http://www2.sys-con.com/ITSG/virtualcd/Dotnet/archives/0112/gomez/index.html
http://codesith.blogspot.com/2007/03/thread-starvation-in-shared-thread-pool.html
(search for thread starvation in them)
You could use a .net profiler to find your bottle necks, here are a good free one:
http://www.eqatec.com/tools/profiler
I agree with Blank, it's likely to be some form of contention. It's likely to be very hard to track down, unfortunately. It could be in your application code, the framework, the OS, or some combination thereof. Your application code is the most likely culprit, since Microsoft has expended significant effort on making the CLR and the OS scale on 32P boxes.
The contention could be in some hot locks, but it could be that some processor cache lines are sloshing back and forth between CPUs.
What's your metric for 10x worse? Throughput?
Have you tried booting the 32-proc box with fewer CPUs? Use the /NUMPROC option in boot.ini or BCDedit.
Do you achieve 100% CPU utilization? What's your context switch rate like? And how does this compare to the 8P box?