Azure SQL - DTU CPU usage disparity - c#

I have been analyzing a DB that is running in Azure SQL that is performing VERY badly. it is on the premium tier with 1750 DTUs available, and at times can still max out DTUs.
Ive identified a variety of querys and terrible data access patterns thru stored procs, which has reduced load. But there is still this massive disparity between DTU and CPU usage in the image below, any other image i see of the "Query Performance Insight" in azure sql shows the DTU line aligning with the CPU usage for the most part.
DTU (in red) to CPU usage per query
Looking at the C# app sitting ontop of this, for each user that uses the app, it creates a SQL user, and uses that user in the connection string to access the DB. This means that connection pooling is not being used, resulting in a massively larger number of active users/sessions on the SQL azure DB. Could this be the sole reason why there is such high DTU usage?
Or could i possibly be missing something regarding IO that isnt visible in the Azure portal?
Thanks
Neil
EDIT: Adding sessions and workers image!
enter image description here
Based on that im not convinved now.. what is the session percentage of? Like its 10% but 10% of what? the max allowed?
Edit2: Adding more metrics:
One week:
2-3 hours when load is high:
The purple spike i believe is the reindex so can ignore that!

Trying to understand DTU versus resources was a stumbling block for me too. Click on your Resource utilization chart and click Edit
Then you get a slider with a lot of resources you can monitor. Select Sessions, and Workers percent. More than likely one of these are your problems. If not, you can add in: CPU, Data IO, Log IO, and/or In-memory OLTP percentage. Hit OK.
Now, what you should find is the real cost in your query or queries. Learning how your queries consume the different resources can help you fix performance problems like these. I learned this when doing large inserts, I was maxing out my Log IO, and everything else was <5% utilization.
Try that, and if you are right about connection pooling, unfortunately that will mean some refactoring in the application. At the very least, using this will give you more insight than just looking at that DTU percentage!

Related

Entity Framework with elastic pool. How to manage my SaaS client database?

I am currently looking to build an SaaS in ASP.Net hosted on Azure Cloud.
I am looking for advice on how to best build my database and the Entity Framework that goes with it. Once a customer registers on the web app, the app needs to create a seperate database for each customer on my Azure SQL server.
I have started looking into the option of elastic pooling, but it has left me quite confused. To tell you a bit about my database, it has one "meta"-database for all general settings. And then each customer has a database with his portfolio.
Example
database [Settings] with tables (Currency, Stocks, Bonds) [
[Customer1] SomeFinanceProduct [Currency as foreign, stock as foreign], SomeOtherFinanceProduct [Currency as foreign, bond as foreign]
[Customer2] SomeFinanceProduct [Currency as foreign, stock as foreign],
SomeOtherFinanceProduct [Currency as foreign, bond as foreign]
[Customer3] etc.
I would appreciate some help from more experienced developpers. Many thanks, this is an important issue for me. I have also found this post from 2015 where they said that the solution would be soon released, but I have not found anything on the web.
I can't speak to the Entity Framework part of your question too much, however, I can speak to the elastic pool side of things.
It's important to note that an Azure elastic pool is just a billing and resource allocation construct. As far as your application or its code is concerned, there is no difference if you use an elastic pool. You still have a database, it lives on a server, and that server (and indirectly, but more specifically in the case of Azure, that database) has resource constraints.
In Azure, you traditionally create a database and choose a service or pricing tier. You pay X dollars in exchange for Y resources (CPU, memory, storage size, connection counts, etc) for that database. You repeat this for every single database you create. Over time, databases grow in size or usage and they become more demanding, so you must change the service tiers of each database individually as this happens. As you have more and more databases, this becomes tedious and cost ineffective.
With elastic pools, you can take any number of individual databases and drop the individual service/pricing plans and instead buy a big bucket of resources [i.e. the elastic pool] and give those resources to all of the databases. The theory is that collectively you need fewer resources with this approach, and this allows you to save money. It also makes better use of the resources you are buying.
The reason you need fewer resources is because generally databases experience peak demand at different times. When you buy resources individually, you have to over buy on every single database to handle the peaks (which means you have a lot of wasted resources just sitting there unused). On an elastic pool, since all database are in the pool together you only buy enough extra resources which will cover however many peaks you typically would have going on concurrently at the same time; now you have fewer resources sitting idle wasting money.
As I mentioned, the other benefit of using elastic pools is that you can make better use of the resources you have. Consider a database which has very low demands placed upon it; you'd naturally purchase a small (and thus cheap) plan for it. Then consider a database which has high demands placed upon it; you'll likely buy a plan with much greater resources. Now, occasionally the low use database gets some big hits. With the small plan, the resources aren't enough and performance degrades terribly. Meanwhile the other database has tons of resources and much of it is being unused. Wouldn't it be nice if the small database which is experiencing the unusual peak could borrow some of those resources for a few minutes? That's exactly what elastic pools do! Elastic pools have lots of built in scalability wins for your applications!
The last important thing to note, is that elastic pools cost more per unit of resource, than regular databases. This means that there is a break even point and that its more expensive to use elastic pools until you have enough databases to make it worthwhile. For my needs I've found 10-15 databases to be a fairly good break even point. Once you have enough, create the pool. Then, as you add more databases to the pool later on, the "per database" costs start going down even more.
--
So to get back to your question, elastic pools will not specifically affect your ability to use Entity Framework for your project. Regardless of whether you choose to pool your databases or not, you'll have to get your code to talk to the appropriate customer specific database based on who is logged in.
You want an elastic pool with a shard per tenant.
This link describes the tools that are available for managing and querying sharded databases in a multi-tenant scenario. Follow the links in the first paragraph for details on each.

Loading multiple large ADO.NET DataTables/DataReaders - Performance improvements

I need to load multiple sql statements from SQL Server into DataTables. Most of the statements return some 10.000 to 100.000 records and each take up to a few seconds to load.
My guess is that this is simply due to the amount of data that needs to be shoved around. The statements themselves don't take much time to process.
So I tried to use Parallel.For() to load the data in parallel, hoping that the overall processing time would decrease. I do get a 10% performance increase, but that is not enough. A reason might be that my machine is only a dual core, thus limiting the benefit here. The server on which the program will be deployed has 16 cores though.
My question is, how I could improve the performance more? Would the use of Asynchronous Data Service Queries be a better solution (BeginExecute, etc.) than PLINQ? Or maybe some other approach?
The SQl Server is running on the same machine. This is also the case on the deployment server.
EDIT:
I've run some tests with using a DataReader instead of a DataTable. This already decreased the load times by about 50%. Great! Still I am wondering whether parallel processing with BeginExecute would improve the overall load time if a multiprocessor machine is used. Does anybody have experience with this? Thanks for any help on this!
UPDATE:
I found that about half of the loading time was consumed by processing the sql statement. In SQL Server Management Studio the statements took only a fraction of the time, but somehow they take much longer through ADO.NET. So by using DataReaders instead of loading DataTables and adapting the sql statements I've come down to about 25% of the initial loading time. Loading the DataReaders in parallel threads with Parallel.For() does not make an improvement here. So for now I am happy with the result and leave it at that. Maybe when we update to .NET 4.5 I'll give the asnchronous DataReader loading a try.
My guess is that this is simply due to the amount of data that needs to be shoved around.
No, it is due to using a SLOW framework. I am pulling nearly a million rows into a dictionary in less than 5 seconds in one of my apps. DataTables are SLOW.
You have to change the nature of the problem. Let's be honest, who needs to view 10.000 to 100.000 records per request? I think no one.
You need to consider to handle paging and in your case, paging should be done on sql server. To make this clear, lets say you have stored procedure named "GetRecords". Modify this stored procedure to accept page parameter and return only data relevant for specific page (let's say 100 records only) and total page count. Inside app just show this 100 records (they will fly) and handle selected page index.
Hope this helps, best regards!
Do you often have to load these requests? If so, why not use a distributed cache?

Query from database or from memory? Which is faster?

I am trying to improve the performance of a Windows Service, developed in C# and .NET 2.0, that processes a great amount of files. I want to process more files per second.
In its process, for each file, the service does a database query to retrieve some parameters of the system.
Those parameters change annually, and I am thinking that I would gain some performance, if a loaded those parameters as a singleton and refreshed this singleton periodically. Instead of make a database query for each file being processed, I would get the parameters from memory.
To complete the scenario : I am using Windows Server 2008 R2 64 Bits, SQL Server 2008 is the database, C# and .NET 2.0 as already mentioned.
I am right in my approach? What would you do?
Thanks!
Those parameters change anually
Yes, do cache them in memory. Especially if they are large or complex.
You should take care to invalidate them at the right time once a year, depending how accurate that has to be.
Simply caching them for an hour or even for a few minutes might be a good compromise.
RAM memory data access is definitely faster that any other data access, except than cpu memories like registries and CPU cache
Chaching would be faster even if you change it every minute, so yes, caching that query is very faster
Crossing a network or going to disk is always orders of magnitude slower than in memory access.
Databases can cache data in memory so if you can achieve that and you're not crossing a network, the database might be faster since their data access patterns/indexes etc... may be faster than you're code. But, that's best case - if you need it faster, in memory caches help.
But, be aware that in memory caches can add complexity and bugs. You have to determine the lifetime of the cached data, how to refresh and the more complex it is, the more wierd edge case state bugs you will have. Even though they change annually, you have to handle that cusp.

Max Amount / LIMIT of ASP.NET sites on one server

My question is simple. About 2 years ago we began migrating to ASP.NET from ASP Classic.
Our issue is we currently have about 350 sites on a server and the server seems to be getting bogged down. We have been trying various things to improve performance, Query Optimizations, Disabling ViewState, Session State, etc and they have all worked, but as we add more sites we end up using more of the server's resources and so the improvements we made in code are virtually erased.
Basically we're now at a tipping point, our CPUs currently average near 100%. Our IS would like us to find new ways to reword the code on the sites to improve performance.
I have a theory, that we are simply at the limit on the amount of sites one server can handle.
Any ideas? Please only respond if you have a good idea about what you are talking about. I've heard a lot of people theorize about the station. I need someone who has actual knowledge about what might be going on.
Here are the details.
250 ASP.NET Sites
250 Admin Sites (Written in ASP.NET, basically they are backend admin sites)
100 Classic ASP Sites
Running on a virtualized Windows Server 2003.
3 CPUs, 4 GB Memory.
Memory stays around 3 - 3.5 GB
CPUs Spike very badly, sometimes they remain near 100% for short period of time ( 30 - 180 seconds)
The database is on a separate server and is SQL SERVER 2005.
It looks like you've reached that point. You've optimised your apps, you've looked at server performance, you can see you are hitting peak memory usage, maxing out the CPU, and, lets face it, administering so many websites musn't be easy.
Also, the spec of your VM isn't fantastic. It's memory, in particular, potentially isn't great for the number of sites you have.
You have plenty of reasons to move.
However, some things to look at:
1) How many of those 250 sites are actually used? Which ones are the peak performance offenders? Those ones are prime candidates for being moved off onto their own box.
2) How many are not used at all? Can you retire any?
3) You are running on a virtual machine. What kind of virtual machine platform are you using? What other servers are running on that hardware?
4) What kind of redundancy do you currently have? 250 sites on one box with no backup? If you have a backup server, you could use that to round robin requests, or as a web farm, sharing the load.
Lets say you decide to move. The first thing you should probably think about is how.
Are you going to simply halve the number of sites? 125 + admins on one box, 125 + admins on the other? Or are you going to move the most used?
Or you could have several virtual machines, all active, as part of a web farm or load balanced system.
By the sounds of things, though, there's a real resistance to buy more hardware.
At some point, you are going to have to though, as sometimes, things just get old or get left behind. New servers have much more processing power and memory in the same space, and can be cheaper to run.
Oh, and one more thing. The cost of all those repeated optimizations and testing probably could easily be offset by buying more hardware. That's no excuse for not doing any optimization at all, of course, and I am impressed by the number of sites you are running, especially if you have a good number of users, but there is a balance, and I hope you can tilt towards the "more hardware" side of it some more.
I think you've answered your own question really. You've optimised the sites, you've got the database server on a different server. And you have 600 sites (250 + 250 + 100).
The answer is pretty clear to me. Buy a box with more memory and CPU power.
There is no real limit on the number of sites your server can handle, if all 600 sites had no users, you wouldn't have very much load on the server.
I think you might find a better answer at serverfault, but here are my 2 cents.
You can scale up or scale out.
Scale up -- upgrade the machine with more memory / more cores in the CPU.
Scale out -- distribute the load by splitting the sites across 2 or more servers. 300 on server A, 300 on server B, or 200 each across 3 servers.
As #uadrive mentions, this is an issue of load, not of # of sites.
Just thinking this through, it seems like you would be better off measuring the # of users hitting the server instead of # of sites. You could have 300 sites and only half are used. Knowing the usage would be better in my mind.
There's no simple formula answer, like "you can have a maximum of 47.3 sites per gig of RAM". You could surely maintain performance with many more sites if each site had only one user per day. There are likely servers that have only two sites but performance is terrible because each hit requires a massive database query.
In practice, the only way to approach this is empirically: When performance starts to degrade, you have a problem. The fact that somebody wrote in a book somewhere that a server with such-and-such resources should be able to support more sites is of little value if, in practice, YOUR server can't support YOUR sites and YOUR users.
Realistic options are:
(a) Optimize your code and database queries. You say you've already done that. Maybe you can do more. It's unlikely that your code is now the absolute best that it can possibly be, but it may well be that the effort to find further improvements will be hugely expensive.
(b) Buy a bigger server.
(c) Break your sites across multiple servers, and either update DNS or install a front-end to map requests to the correct server.
Maxing out on CPU use can be a good sign, in the sense that moving to a large server or dividing the sites between multiple servers, is likely to help.
There are many things you can do to help improve performance and scalability (in fact, I've written a book on this subject -- see my profile).
It's difficult to make meaningful suggestions without knowing much more about your apps, but here are a few quick tips that might help to get you started:
Multiple AppPools are expensive. How many sites do you have per AppPool? Combine multiple sites per AppPool if you can
Minimize client round-trips: improve client and proxy-level caching, offload static files to a CDN, use image sprites, merge multiple CSS and JS files
Enable output caching on pages and/or controls were possible
Enable compression for static files (more CPU use on first access, but less after that)
Avoid Session state all together if you can (prefer cookies for state management). If you can't, then at least configure EnableSessionState="ReadOnly" session state for pages that don't need to write it, or "false" for pages that don't need it at all
Many things on the SQL Server side: caching, SqlCacheDependency, command batching, grouping multiple insert/update/deletes into a single transaction, using stored procedures instead of dynamic SQL, using async ADO.NET instead of LINQ or EF, make sure your DB logs are on separate spindles from data, etc
Look for algorithmic issues with your code; for example, hash tables are often better than linear searches, etc
Minimize cookie sizes, and only set cookies on pages, not on static content.
In addition, using a VM is likely to cost you up to about 10% in performance -- make sure it's really worth that for what it buys you in terms of improved manageability.

is access 2007 can work good with 30 users parallel

is access 2007 can work good with 30 users parallel through my C# program ?
thank's in advance
Access is not very good for concurrent use. I have seen recommendation of a maximum of 10 people at one time.
To be honest, it depends on how these users are working and the load on it, however, it is not designed for such use (it is designed to be a desktop databasew not an enterprise database) so may fail under such usage. Use a database designed for you scenario - something like MySql or SQL Server Express, if you want to avoid extra costs.
See this article on 15seconds for a discussion on the suitability (or lack thereof) of access to concurrent usage.
The Jet and ACE database engines can support 255 users, not just 255 concurrent connections. This is because the standard for interaction with a Jet/ACE data store is a single connection for each user, opened and then re-used throughout the session. However, it definitely is the case that under normal usage Jet/ACE may open more than one connection per user, so 255 is not even a reliable theoretical limit.
Jet/ACE interacts with a data file, and maintains locking via its locking file (*.LDB). Contention for the data file and the LDB file can easily overwhelm the file system's ability to keep up, so in general, the practical limit on number of users is much lower than the 255 theoretical limit (you'll note that 255 is one less than a power of 2, hint, hint).
In real-world scenarios, a properly designed Access application with a Jet/ACE data store running on a reliable network and stored on a server with a native Windows file system can be quite stable into the 20-30 users range. But it depends on what those users are doing. The more that are read-only, the higher the number of simultaneous users that can be supported.
Experienced Access developers report engineering apps to work with as many as 100 simultaneous users, but at that point, you basically have to rewrite as an unbound app, and then you're giving up most of the advantages of Access as front end in order to nurse along a back end that is better used with a smaller user population.
My basic rule is that any time a user population reaches 15 simultaneous users, I start talking to the client about upsizing to SQL Server, not because it's required, but because they need to get used to the idea that as usage grows, they're going to need to upsize. Whether that happens at 15 users or 20 or 30 depends on the nature of the particular app. As I said above, if many of the users are read-only for most of their session, you have more headroom than if everybody is adding/updating records most of the time.
Given that a C# app is going to be an unbound app, I wouldn't think that 30 users should be terribly problematic, but I'm not a C# programmer. If it's new development and there's any possibility that the user population will grow beyond 30 users, it just seems like a no-brainer to me to build with a server back end instead of with Jet/ACE.
I never did it with 2007, but I had problems in the past with the XP version and only 3 users working 8 hours a day.
So, based in my previous experience, try to avoid it. Make your customer to change their requirement will be easier than the problems derived from to use Access in a paralell enviroment. After all, also based in my experience... your customer will be changing their requirements almost every week! :D
May the Force be with you.

Categories