How to know when too much Parallel/Threads is killing performance?

How to know when too much Parallel/Threads is killing performance? - c#

I'm using several applications to process/send/retrieve data on/from the web and I'm asking if I can add more of these apps or if there are already too much apps on this server? I'm using parallel-processing for example when I need to retrieve data from several sources. Most of my classes are async (due to Mongo new queries which are async).
I'm wondering how many threads is "too much" for a CPU and is there a method to figure it out?
Here is a printscreen of my server at the moment. I can see that there are 24 logical processors, 191 processes and 3484 active threads! Could we say that this is too much ? But the CPU Utilization is only 31% ...
And for example on this server I had an app with 20 as degree of parallelism, I set it to 8 and the perfs was just the same ...

Related

Azure SQL - DTU CPU usage disparity

I have been analyzing a DB that is running in Azure SQL that is performing VERY badly. it is on the premium tier with 1750 DTUs available, and at times can still max out DTUs.
Ive identified a variety of querys and terrible data access patterns thru stored procs, which has reduced load. But there is still this massive disparity between DTU and CPU usage in the image below, any other image i see of the "Query Performance Insight" in azure sql shows the DTU line aligning with the CPU usage for the most part.
DTU (in red) to CPU usage per query
Looking at the C# app sitting ontop of this, for each user that uses the app, it creates a SQL user, and uses that user in the connection string to access the DB. This means that connection pooling is not being used, resulting in a massively larger number of active users/sessions on the SQL azure DB. Could this be the sole reason why there is such high DTU usage?
Or could i possibly be missing something regarding IO that isnt visible in the Azure portal?
Thanks
Neil
EDIT: Adding sessions and workers image!
enter image description here
Based on that im not convinved now.. what is the session percentage of? Like its 10% but 10% of what? the max allowed?
Edit2: Adding more metrics:
One week:
2-3 hours when load is high:
The purple spike i believe is the reindex so can ignore that!

Trying to understand DTU versus resources was a stumbling block for me too. Click on your Resource utilization chart and click Edit
Then you get a slider with a lot of resources you can monitor. Select Sessions, and Workers percent. More than likely one of these are your problems. If not, you can add in: CPU, Data IO, Log IO, and/or In-memory OLTP percentage. Hit OK.
Now, what you should find is the real cost in your query or queries. Learning how your queries consume the different resources can help you fix performance problems like these. I learned this when doing large inserts, I was maxing out my Log IO, and everything else was <5% utilization.
Try that, and if you are right about connection pooling, unfortunately that will mean some refactoring in the application. At the very least, using this will give you more insight than just looking at that DTU percentage!

.NET Service - Analysing Development vs Test performance

I have a .NET REST API written using C# MVC5.
The API uses repository that fire hoses necessary data from database, then analyses it and transforms into usable model. The transformation uses a lot of linq to model the data.
On dev (Windows 10), i7 8 core # 3.7ghz, 32gb ram. it takes 10 secs for large test range.
Running on a VM (Windows 2008R2) virtual xeon with 8 virtual cores # 2.99ghz, 8GB RAM takes 300 seconds (5 mins).
Neither exhaust memory, and neither are CPU-bound (CPU touches 50% on the VM, and barely noticed on dev box.)
Same database, code etc.
The API makes use of async api to load some peripheral data whilst it's doing primary job, so I could put some logs in to log time I guess.
What are the common techniques for tackling this problem? Can the CPU speed really be making that much difference?
thanks
EDIT:
FOllowing comment by Pieter, I've increased the VM's memory to 12GB and monitored the performance of the VM whilst executing the operation. It's not the best visual aid (screen shot of TM end of op), but what it did show what that the vCPUs never really went above ~60% and memory - apart from a few mb at beginning of request, never went above 2.7GB.
If IIS / .NET / my operation is not maxing out the resources, what is taking so long?

Parallel Programing with Threads

Okay, I am bit confuse on what and how should I do. I know the theory of Parallel Programming and Threading, but here is my case:
We have number of log files in given folder. We read these log files in database. Usually reading these files take couple of hours to read, as we do it in serial method, i.e. we iterate through each file, then open a SQL transaction for each file and insert the log in database, then read another and do the same.
Now, I am thinking of using Parallel programming so I can consume all core of CPU, however I am still not clear if I use Thread for each file, will that make any difference to system? I mean if I create say 30 threads then will they run on single core or they run on Parallel ? How can I use both of them? if they are not already doing that?
EDIT: I am using Single Server, with 10K HDD Speed, and 4 Core CPU, with 4 GB RAM, no network operation, SQL Server is on same machine with Windows 2008 as OS. [can change OS if that help too :)].
EDIT 2: I run some test to be sure based on your feedbacks, here is what I found on my i3 Quad Core CPU with 4 GB RAM
CPU remains at 24-50% CPU1, CPU2 remain under 50% usage, CPU3 remain at 75% usage and CPU4 remains around 0%. Yes I have Visual studio, eamil client and lot of other application open, but this tell me that application is not using all core, as CPU4 remain 0%;
RAM remain constantly at 74% [it was around 50% before test], that is how we design the read. So, nothing to worry
HDD remain READ/Write or usage value remain less than 25% and even it spike to 25% in sine wave, as our SQL transaction first stored in memory and then it write to disk when memory is getting threshold, so again,
So all resources are under utilized here, and hence I think I can distribute work to make it efficient. Your thoughts again. Thanks.

First of all, you need to understand your code and why is it slow. If you're thinking something like “my code is slow and uses one CPU, so I'll just make it use all 4 CPUs and it will be 4 times faster”, then you're most likely wrong.
Using multiple threads makes sense if:
Your code (or at least a part of it) is CPU bound. That is, it's not slowed down by your disk, your network connection or your database server, it's slowed down by your CPU.
Or your code has multiple parts, each using a different resource. E.g. one part reads from a disk, another part converts the data, which requires lots of CPU and last part writes the data to a remote database. (Parallelizing this doesn't actually require multiple threads, but it's usually the simplest way to do it.)
From your description, it sounds like you could be in situation #2. A good solution for that is the producer consumer pattern: Stage 1 thread reads the data from the disk and puts it into a queue. Stage 2 thread takes the data from the queue, processes them and puts them into another queue. Stage 3 thread takes the processed data from the second queue and saves them to the database.
In .Net 4.0, you would use BlockingCollection<T> for the queue between the threads. And when I say “thread”, I pretty much mean Task. In .Net 4.5, you could use blocks from TPL Dataflow instead of the threads.
If you do it this way then you can get up to three times faster execution (if each stage takes the same time). If Stage 2 is the slowest part, then you can get another speedup by using more than one thread for that stage (since it's CPU bound). The same could also apply to Stage 3, depending on your network connection and your database.

There is no definite answer to this question and you'll have to test because as mentionned in my comments:
if the bottleneck is the disk I/O then you won't gain a lot by adding more threads and you might even worsen performance because more threads will be fighting to get access to the disk
if you think disk I/O is OK but CPU loads is the issue then you can add some threads, but no more than the number of cores because here again things will worsen due to context switching
if you can do more disk and network I/Os and CPU load is not high (very likely) then you can oversubscribe with (far) more threads than cores: typically if your threads are spending much of their time waiting for the database
So you should profile first, and then (or directly if you're in a hurry) test different configurations, but chances are you'll be in the third case. :)

First, you should check what is taking the time. If the CPU actually is the bottleneck, parallel processing will help. Maybe it's the network and a faster network connection will help. Maybe buying a faster disc will help.
Find the problem before thinking about a solution.

Your problem is not using all CPU, your action are mainly I/O (reading file , sending data to DB).
Using Thread/Parallel will make your code run faster since you are processing many files at the same time.
To answer your question , the framework/OS will optimize running your code over the different cores.

It varies from machine to machine but speaking generally if you have a dual core processor and you have 2 threads the Operating System will pass one thread to one core and the other thread to the other. It doesn't matter how many cores you use what matters is whether your equation is the fastest. If you want to make use of Parallel programming you need a way of sharing the workload in a way that logically makes sense. Also you need to consider where your bottleneck is actually occurring. Depending on the size of the file it may be simply the max speed of your read/write of the storage medium that is taking so long.As a test I suggest you log where the most time in your code is being consumed.
A simple way to test whether a non-serial approach will help you is to sort your files in some order divide the workload between 2 threads doing the same job simultaneously and see if it makes a difference. If a second thread doesn't help you then I guarantee 30 threads will only make it take longer due to the OS having to switch threads back and fourth.

Using the latest constructs in .Net 4 for parallel programming, threads are generally managed for you... take a read of getting started with parallel programming
(pretty much the same as what has happened more recently with async versions of functions to use if you want it async)
e.g.
for (int i = 2; i < 20; i++)
{
var result = SumRootN(i);
Console.WriteLine("root {0} : {1} ", i, result);
}
becomes
Parallel.For(2, 20, (i) =>
{
var result = SumRootN(i);
Console.WriteLine("root {0} : {1} ", i, result);
});
EDIT: That said, it would be productive / faster to perhaps also put intensive tasks into seperate threads... but to manually make your application 'Multi-Core' and have things like certain threads running on particular cores, that isn't currently possible, that's all managed under the hood...
have a look at plinq for example
and .Net Parallel Extensions
and look into
System.Diagnostics.Process.GetCurrentProcess().ProcessorAffinity = 4
Edit2:
Parallel processing can be done inside a single core with multiple threads.
Multi-Core processing means distributing those threads to make use of the multiple cores in a CPU.

Max Amount / LIMIT of ASP.NET sites on one server

My question is simple. About 2 years ago we began migrating to ASP.NET from ASP Classic.
Our issue is we currently have about 350 sites on a server and the server seems to be getting bogged down. We have been trying various things to improve performance, Query Optimizations, Disabling ViewState, Session State, etc and they have all worked, but as we add more sites we end up using more of the server's resources and so the improvements we made in code are virtually erased.
Basically we're now at a tipping point, our CPUs currently average near 100%. Our IS would like us to find new ways to reword the code on the sites to improve performance.
I have a theory, that we are simply at the limit on the amount of sites one server can handle.
Any ideas? Please only respond if you have a good idea about what you are talking about. I've heard a lot of people theorize about the station. I need someone who has actual knowledge about what might be going on.
Here are the details.
250 ASP.NET Sites
250 Admin Sites (Written in ASP.NET, basically they are backend admin sites)
100 Classic ASP Sites
Running on a virtualized Windows Server 2003.
3 CPUs, 4 GB Memory.
Memory stays around 3 - 3.5 GB
CPUs Spike very badly, sometimes they remain near 100% for short period of time ( 30 - 180 seconds)
The database is on a separate server and is SQL SERVER 2005.

It looks like you've reached that point. You've optimised your apps, you've looked at server performance, you can see you are hitting peak memory usage, maxing out the CPU, and, lets face it, administering so many websites musn't be easy.
Also, the spec of your VM isn't fantastic. It's memory, in particular, potentially isn't great for the number of sites you have.
You have plenty of reasons to move.
However, some things to look at:
1) How many of those 250 sites are actually used? Which ones are the peak performance offenders? Those ones are prime candidates for being moved off onto their own box.
2) How many are not used at all? Can you retire any?
3) You are running on a virtual machine. What kind of virtual machine platform are you using? What other servers are running on that hardware?
4) What kind of redundancy do you currently have? 250 sites on one box with no backup? If you have a backup server, you could use that to round robin requests, or as a web farm, sharing the load.
Lets say you decide to move. The first thing you should probably think about is how.
Are you going to simply halve the number of sites? 125 + admins on one box, 125 + admins on the other? Or are you going to move the most used?
Or you could have several virtual machines, all active, as part of a web farm or load balanced system.
By the sounds of things, though, there's a real resistance to buy more hardware.
At some point, you are going to have to though, as sometimes, things just get old or get left behind. New servers have much more processing power and memory in the same space, and can be cheaper to run.
Oh, and one more thing. The cost of all those repeated optimizations and testing probably could easily be offset by buying more hardware. That's no excuse for not doing any optimization at all, of course, and I am impressed by the number of sites you are running, especially if you have a good number of users, but there is a balance, and I hope you can tilt towards the "more hardware" side of it some more.

I think you've answered your own question really. You've optimised the sites, you've got the database server on a different server. And you have 600 sites (250 + 250 + 100).
The answer is pretty clear to me. Buy a box with more memory and CPU power.

There is no real limit on the number of sites your server can handle, if all 600 sites had no users, you wouldn't have very much load on the server.
I think you might find a better answer at serverfault, but here are my 2 cents.
You can scale up or scale out.
Scale up -- upgrade the machine with more memory / more cores in the CPU.
Scale out -- distribute the load by splitting the sites across 2 or more servers. 300 on server A, 300 on server B, or 200 each across 3 servers.
As #uadrive mentions, this is an issue of load, not of # of sites.

Just thinking this through, it seems like you would be better off measuring the # of users hitting the server instead of # of sites. You could have 300 sites and only half are used. Knowing the usage would be better in my mind.

There's no simple formula answer, like "you can have a maximum of 47.3 sites per gig of RAM". You could surely maintain performance with many more sites if each site had only one user per day. There are likely servers that have only two sites but performance is terrible because each hit requires a massive database query.
In practice, the only way to approach this is empirically: When performance starts to degrade, you have a problem. The fact that somebody wrote in a book somewhere that a server with such-and-such resources should be able to support more sites is of little value if, in practice, YOUR server can't support YOUR sites and YOUR users.
Realistic options are:
(a) Optimize your code and database queries. You say you've already done that. Maybe you can do more. It's unlikely that your code is now the absolute best that it can possibly be, but it may well be that the effort to find further improvements will be hugely expensive.
(b) Buy a bigger server.
(c) Break your sites across multiple servers, and either update DNS or install a front-end to map requests to the correct server.

Maxing out on CPU use can be a good sign, in the sense that moving to a large server or dividing the sites between multiple servers, is likely to help.
There are many things you can do to help improve performance and scalability (in fact, I've written a book on this subject -- see my profile).
It's difficult to make meaningful suggestions without knowing much more about your apps, but here are a few quick tips that might help to get you started:
Multiple AppPools are expensive. How many sites do you have per AppPool? Combine multiple sites per AppPool if you can
Minimize client round-trips: improve client and proxy-level caching, offload static files to a CDN, use image sprites, merge multiple CSS and JS files
Enable output caching on pages and/or controls were possible
Enable compression for static files (more CPU use on first access, but less after that)
Avoid Session state all together if you can (prefer cookies for state management). If you can't, then at least configure EnableSessionState="ReadOnly" session state for pages that don't need to write it, or "false" for pages that don't need it at all
Many things on the SQL Server side: caching, SqlCacheDependency, command batching, grouping multiple insert/update/deletes into a single transaction, using stored procedures instead of dynamic SQL, using async ADO.NET instead of LINQ or EF, make sure your DB logs are on separate spindles from data, etc
Look for algorithmic issues with your code; for example, hash tables are often better than linear searches, etc
Minimize cookie sizes, and only set cookies on pages, not on static content.
In addition, using a VM is likely to cost you up to about 10% in performance -- make sure it's really worth that for what it buys you in terms of improved manageability.

Multiple app instances, windows GDI limit

Im trying to run simultaneously hundreds of instances of the same app(using C#), and after about 200 instances the GUI starts to slow down dramatically until the point that the load time of the next instance is climbing up to 20s (from 1s).
The test maching is :
xeon 5520
12gb ram
windows 2008 web 64 bit
at max load (200 instances) the cpu is at about 20% and ram 45%, so im sure its not a hardware issue.
I already tried configuring Session size and SharedSection in the registry of the windows but it doesnt seem to help.
I also tried to running the app in the background and also on multiple sessions (different sessions) and still the same (i though maybe it a limitation per session).
When the slowdown occures for example on one session i can login to another session and the desktops works without a problem (the first dekstop becomse unusable.)
My question is - is there a way to strip the gdi objects or maybe eliminate the use of the GUI? or is it a windows limitation?
p.s - I cant change the app since its a third pary.
Thanks in advance.

With 200 instances running, the constant context switching is probably hurting performance. Context switching isn't counted in CPU load.
Edit: whoops, wrong link.
Try monitoring context switching on your system
http://technet.microsoft.com/en-us/library/cc938606.aspx

I doubt it's GDI - if you run out of GDI handles/resources you'll notice vast chunks of your windows failing to redraw, rather than everythign slowing down.
The most likely reason for a sudden drop in performance is that you are maxing out your RAM and thrashing your Virtual Memory as all your processes fight for CPU time. Check memory usage, and if it's high, see if you can reduce the footprint of your application. Or apply a "hardware fix" by installing more RAM. Or add Sleeps into your Apps where possible so that they aren't demanding constant timeslices from your CPU (and thus needing to be constantly paged in from VM).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.