I know the answer to this question for the most part is "It Depends", however I wanted to see if anyone had some pointers.
We execute queries each request in ASP.NET MVC. Each request we need to get user rights information, and Various data for the Views that we are displaying. How many is too much, I know I should be conscious to the number of queries i am executing. I would assume if they are small queries and optimized out, half-a-dozen should be okay? Am I right?
What do you think?
Premature optimization is the root of all evil :)
First create your application, if it is sluggish you will have to determine the cause and optimize that part. Sure reducing the queries will save you time, but also optimizing those queries that you have to do.
You could spend a whole day shaving off 50% time spend off a query, that only took 2 milisecond to begin with, or spend 2 hours on removing some INNER JOINS that made another query took 10 seconds. Analyse whats wrong before you start optimising.
The optimal amount would be zero.
Given that this is most likely not achievable, the only reasonable thing to say about is: "As little as possible".
Simplify your site design until it's
as simple as possible, and still
meeting your client's requirements.
Cache information that can be cached.
Pre-load information into the cache
outside the request, where you can.
Ask only for the information that you
need in that request.
If you need to make a lot of independant queries for a single request, parallelise the loading as much as possible.
What you're left with is the 'optimal' amount for that site.
If that's too slow, you need to review the above again.
User rights information may be able to be cached, as may other common information you display everywhere.
You can probably get away with caching more than the requirements necessitate. For instance - you can probably cache 'live' information such as product stock levels, and the user's shopping cart. Use SQL Change Notifications to allow you to expire and repopulate the cache in the background.
As few as possible.
Use caching for lookups. Also store some light-weight data (such as permissions) in the session.
Q: Do you have a performance problem related to database queries?
Yes? A: Fewer than you have now.
No? A: The exact same number you have now.
If it ain't broke, don't fix it.
While refactoring and optimizing to save a few milliseconds is a fun and intellectually rewarding way for programmers to spend time, it is often a waste of time.
Also, changing your code to combine database requests could come at the cost of simplicity and maintainability in your code. That is, while it may be technically possible to combine several queries into one, that could require removing the conceptual isolation of business objects in your code, which is bad.
You can make as many queries as you want, until your site gets too slow.
As many as necessary, but no more.
In other words, the performance bottlenecks will not come from the number of queries, but what you do in the queries and how you deal with the data (e.g. caching a huge yet static resultset might help).
Along with all the other recommendations of making fewer trips, it also depends on how much data is retrieved on each round trip. If it is just a few bytes, then it can probably be chatty and performance would not hurt. However, if each trip returns hundreds of kb, then your performance will hurt faster.
You have answered your own question "It depends".
Although, trying to justify optimal number number of queries per HTTP request is not a legible scenario. If your SQL server has real good hardware support than you could run good number of queries in less time and have real low turn around time for the HTTP request. So basically, "it depends" as you rightly said.
As the comments above indicate, some caching is likely appropriate for your situation. And like your question suggests, the real answer is "it depends." Generally, the fewer the queries, the better since each query has a cost associated with it. You should examine your data model and your application's requirements to determine what is appropriate.
For example, if a user's rights are likely to be static during the user's session, it makes sense to cache the rights data so fewer queries are required. If aspects of the data displayed in your View are also static for a user's session, these could also be cached.
Related
I am trying to mimic some desktop software. The users are accustomed to never saving. The software saves as they change values.
I'm using blur and change events in jquery to trigger updates.
Clearly, this is going to use a lot of unnecessary bandwidth, but it does meet the requirements.
I have no problem doing this, but I want to ask if there is a clear, definitive reason not to do this?
Is there a clearly preferable alternative? Saving every few seconds for instance.
edit - I should note that the updates are segregated, so all of the data is not sent and received in each update. It may be up to 4 or 5 tables and 200 or so fields at once, but more typically, its a couple tables and 10 or so fields.
You're exact requirement seems a little vague but, as far as I understand, you seem to be doing the correct thing.
You can refine things a bit if you wish:
Serialization. It's not the same to send data serialized as an XML compared to JSON. For better bandwidth save, JSON serialization is recommended.
Encoding. To correctly analyze bandwidth usage you could consider thinking about what kind of stuff you're sending to the backend. Does it make sense to send it plain?, could you take advantage of some compressing algorithm? Does doing this extra calculation makes a noticeable performance impact on your solution?
Scheduling. This really depends on your requirements, but does it really makes sense to sync on every change?. Can you take the risk of syncing in intervals and possible lose some changes?. This decision could make a huge impact on total bandwidth usage of your application.
Local storage. This really depends on how you should meet your requirements, but maybe you could take advantage of local storage in HTML5 depending on your decision regarding to 3. Just an idea.
I read this question-answer explaining that usage of second level cache on 50,000 rows isn't efficient.
So on which amount of data NHibernate second cache is helpful and when it's not and even ruins performance?
For example: if I have 3,500 Employees (Which I still don't...) will it be a good thing to use the second level cache?
You should mainly use it for 'static' data. Example is a website that does business in selling flight tickets via a shopping site. The shopping bag, orders and orderlines are volatile data. Those are not cached.
But the location data like airports, and the airline data and all the connected names in different languages are 'static'. Those can be cached for long and will than not cause roundtrips to the database every time your app needs those.
So, make a distinguishment between your static and volatile data.
What exactly to cache, what not and how long; Always depends on the usage of your application of course. Use different cache regions with different expire times when needed.
Unfortunately, the answer to that kind of question is not trivial.
Caches will almost always improve your performance when data is read more than it's written, but the only way to see if it helps in your particular case is profiling.
Also, it's never an all-or-nothing proposition. You will likely benefit from caching some entities and some queries. With different lifetimes, usages, etc.
I am designing a database and I would like to normalize the database. In one query I will joining about 30-40 tables. Will this hurt the website performance if it ever becomes extremely popular? This will be the main query and it will be getting called 50% of the time. The other queries I will be joining about two tables.
I have a choice right now to normalize or not to normalize but if the normalization becomes a problem in the future I may have to rewrite 40% of the software and it may take me a long time. Does normalization really hurt in this case? Should I denormalize now while I have the time?
I quote: "normalize for correctness, denormalize for speed - and only when necessary"
I refer you to: In terms of databases, is "Normalize for correctness, denormalize for performance" a right mantra?
HTH.
When performance is a concern, there are usually better alternatives than denormalization:
Creating appropriate indexes and statistics on the involved tables
Caching
Materialized views (Indexed views in MS SQL Server)
Having a denormalized copy of your tables (used exclusively for the queries that need them), in addition to the normalized tables that are used in most cases (requires writing synchronization code, that could run either as a trigger or a scheduled job depending on the data accuracy you need)
Normalization can hurt performance. However this is no reason to denormalize prematurely.
Start with full normalization and then you'll see if you have any performance problems. At the rate you are describing (1000 updates/inserts per day) I don't think you'll run into problems unless the tables are huge.
And even if there are tons of database optimization options (Indexes, Prepared stored procedures, materialized views, ...) that you can use.
Maybe I missing something here. But if your architecture requires you to join 30 to 40 tables in a single query, ad that query is the main use of your site then you have larger problems.
I agree with others, don't prematurely optimize your site. However, you should optimize your architecture to account for you main use case. a 40 table join for a query run over 50% of the time is not optimized IMO.
Don't make early optimizations. Denormalization isn't the only way to speed up a website. Your caching strategy is also quite important and if that query of 30-40 tables is of fairly static data, caching the results may prove to be a better optimization.
Also, take into account the number of writes to the number of reads. If you are doing approximately 10 reads for every insert or update, you could say that data is fairly static, hence you should cache it for some period of time.
If you end up denormalizing your schema, your writes will also become more expensive and potentially slow things down as well.
Really analyze your problem before making too many optimizations and also wait to see where your bottlenecks in the system really as you might end up being surprised as to what it is you should optimize in the first place.
I am building a system, where the data will be added by user every 30 secs or so. In this case, should I go for a batch insert(insert after every 2 mins) or do I insert every time the user enters the data. The system is to be built on c# 3.5 and Sql server.
Start with normal inserts. You're nowhere near having to optimize this.
If performance becomes a problem, or it would be obvious that it may be a concern, only then do you need to look at optimizing -- and even then, it may not be an issue with inserts! Use a profiler to determine the bottleneck.
Once every 30 seconds is no significant stress. By the KISS principle, I prefer one-by-one in this case.
If it is every 30 seconds I would go for insert immediately if the insert is as quick as it should be (<2 seconds for large data).
If you see potential future growth to see more frequent transactions then consider bulking the inserts.
It really depends on your requirements, are you able to give more information?
For example:
Are you running a web app?
Is the data time sensitive (to be used by other users)
Do you have to worry about concurrent usage of the data?
What volume of transactions are you looking at?
For small amounts of data I would simply insert the rows as the user requires, the gain of caching will likely be minimal and it simplifies your implementation. If the usage is expected to be quite high come back and look at optimising the design. Comes back to the general rule of avoiding premature optimisations :)
I'm still yet to find a decent solution to my scenario. Basically I have an ASP.NET MVC website which has a fair bit of database access to make the views (2-3 queries per view) and I would like to take advantage of caching to improve performance.
The problem is that the views contain data that can change irregularly, like it might be the same for 2 days or the data could change several times in an hour.
The queries are quite simple (select... from where...) and not huge joins, each one returns on average 20-30 rows of data (with about 10 columns).
The queries are quite simple at the sites current stage, but over time the owner will be adding more data and the visitor numbers will increase. They are large at the moment and I would be looking at caching as traffic will mostly be coming from Google AdWords etc and fast loading pages will be a benefit (apparently).
The site will be hosted on a Microsoft SQL Server 2005 database (But can upgrade to 2008 if required).
Do I either:
Set the caching to the minimum time an item doesn't change for (E.g. cache for say 3 mins) and tell the owner that any changes will take upto 3 minutes to appear?
Find a way to force the cache to clear and reprocess on changes (E.g. if the owner adds an item in the administration panel it clears the relevant caches)
Forget caching all together
Or is there an option that would be suit this scenario?
If you are using Sql Server, there's also another option to consider:
Use the SqlCacheDependency class to have your cache invalidated when the underlying data is updated. Obviously this achieves a similar outcome to option 2.
I might actually have to agree with Agileguy though - your query descriptions seem pretty simplistic. Thinking forward and keeping caching in mind while you design is a good idea, but have you proven that you actually need it now? Option 3 seems a heck of a lot better than option 1, assuming you aren't actually dealing with significant performance problems right now.
Premature optimization is the root of all evil ;)
That said, if you are going to Cache I'd use a solution based around option 2.
You have less opportunity for "dirty" data in that manner.
Kindness,
Dan
2nd option is the best. Shouldn't be so hard if the same app edits/caches data. Can be more tricky if there is more than one app.
If you can't go that way, 1st might be acceptable too. With some tweaks (i.e. - i would try to update cache silently on another thread when it hits timeout) it might work well enough (if data are allowed to be a bit old).
Never drop caching if it's possible. Everyone knows "premature optimization..." verse, but caching is one of those things that can increase scalability/performance of application dramatically.