I read this question-answer explaining that usage of second level cache on 50,000 rows isn't efficient.
So on which amount of data NHibernate second cache is helpful and when it's not and even ruins performance?
For example: if I have 3,500 Employees (Which I still don't...) will it be a good thing to use the second level cache?
You should mainly use it for 'static' data. Example is a website that does business in selling flight tickets via a shopping site. The shopping bag, orders and orderlines are volatile data. Those are not cached.
But the location data like airports, and the airline data and all the connected names in different languages are 'static'. Those can be cached for long and will than not cause roundtrips to the database every time your app needs those.
So, make a distinguishment between your static and volatile data.
What exactly to cache, what not and how long; Always depends on the usage of your application of course. Use different cache regions with different expire times when needed.
Unfortunately, the answer to that kind of question is not trivial.
Caches will almost always improve your performance when data is read more than it's written, but the only way to see if it helps in your particular case is profiling.
Also, it's never an all-or-nothing proposition. You will likely benefit from caching some entities and some queries. With different lifetimes, usages, etc.
Related
I have a Web API that provides complex statistical/forecast data. I have one endpoint that can take even 20s to complete, so I started looking at caching to boots the performance. My situation is very different from these described in many examples, so I need help.
Long story short, the method returns a batch of forecasts and statistics for item. For single item, it's as quick as 50ms, that's good. But there is also a method (very complex) that needs 2000-3000 items AT ONCE, to calculate different statistics. And this is a problem.
There are probably around 250,000 items in the database, around 200M rows in one table. The good part is: Table only updates ONCE per day and I would need around 1GB of data (around 80M "optimized" rows).
So my idea was, once per day (I know exactly when) the API would query, transform, optimize and put into memory 1GB of data from that table, and during the day, it will be lighting fast.
My question is, is it a good idea? Should I use some external provider (like Memcached or Redis) or just a singleton list with proper locking using semaphores etc?
If Memcache, how can I do it? I don't want to cache this table "as is". It's too big. I need to do some transformation first.
Thanks!
This is a good solution if you are not limited by server RAM, imo. Since it's .Net Core you can try System.Runtime.Caching.MemoryCache
To Query everytime on the Database and use 'WHERE' operator?
SELECT * FROM tblProduct WHERE productID = #productID
OR
To Filter the Products List that are put into Cache?
DataTable dtProducts = new DataTable();
dtProducts = HttpContext.Current.Cache["CachedProductList"] as DataTable;
DataView dvProduct = new DataView();
dvProduct = dtProducts.DefaultView;
dvProduct.RowFilter = String.Format("[productID] = {0}", iProductID);
Please share your opinion. Thanks in advance.
Performance is very subjective to your data and how you use it. The method to know what works for sure is to benchmark.
Decide to cache only when your db performance does not meet the performance you require.
When you cache data, you add a lot of overhead in making sure it is up-to-date.
Sql server does not read from disk every time you fire a query, it caches results of frequent queries. Before you decide to cache, know the caching mechanisms used by your database. Using a stored procedure would allow you to cache the query plan too.
Caching data, especially through an in-memory mechanism like HttpContext.Current.Cache is (almost) always going to be faster than going back to the database. Going to the database requires establishing network connections, then the database has to do I/O, etc., whereas using the cache you just use objects in memory. That said, there are a number of things you have to take into account:
The ASP.NET runtime cache is not distributed. If you will be running this code on multiple nodes, you have to decide if you're okay with different nodes potentially having different version of the cached data.
Caches can be told to hold onto data for as long as you want them to, as short as just a few minutes and as long as forever. You have to take into consideration how long the data is going to remain unchanged when deciding how long to cache it. Product data probably doesn't change more often than once a day, so it's a very viable candidate for caching.
Be aware though that the cache time limits you set are not absolutes; objects can be evicted from the cache because of memory limits or when a process/app pool recycles.
As pointed out above, DataTable is not a good object to cache; it's very bulky and expensive to serialize. A list of custom classes is a much better choice from a performance standpoint.
I would say as a general rule of thumb, if you need a set of data more frequently than a few times an hour and it changes less frequently than every few hours, it would be better to pull the list from the database, cache it for a reasonable amount of time, and retrieve it by a filter in code. But that's a general rule; this is the kind of thing that's worth experimenting with in your particular environment.
200,000 objects is a lot of data to put into a cache, but it's also a lot of work for the database if you have to retrieve it frequently. Perhaps there's some subset of it that would be better to cache, and a different, less frequently used subset that could be retrieved every time it's needed. As I said, experiment!
I would prefer the first method. Having 20000 rows in cache does not sound good to me.
In a distributed caching scenario, is it generally advised to use or avoid monolithic objects stored in cache?
I'm working with a service backed by an EAV schema, so we're putting caching in place to minimize the perceived performance deficit imposed by EAV when retrieving all primary records and respective attribute collections from the database. We will prime the cache on service startup.
We don't have particularly frequent calls for all products -- clients call for differentials after they first populate their local cache with the object map. In order to perform that differential, the distributed cache will will need to reflect changes to individual records in the database that are performed on an arbitrary basis, and be processed for changes as differentials are called for by clients.
First thought was to use a List or Dictionary to store the records in the distributed cache -- get the whole collection, manipulate or search it in-memory locally, put the whole collection back into the cache. Later thinking however led to the idea of populating the cache with individual records, each keyed in a way to make them individually retrievable from/updatable to the cache. This led to wondering which method would be more performant when it comes to updating all data.
We're using Windows Server AppFabric, so we have a BulkGet operation available to us. I don't believe there's any notion of a bulk update however.
Is there prevailing thinking as to distributed cache object size? If we had more requests for all items, I would have concerns about network bandwidth, but, for now at least, demand for all items should be fairly minimal.
And yes, we're going to test and profile each method, but I'm wondering if there's anything outside the current scope of thinking to consider here.
So in our scenario, it appears that monolithic cache objects are going to be preferred. With big fat pipes in the datacenter, it takes virtually no perceptible time for ~30 MB of serialized product data to cross the wire. Using a Dictionary<TKey, TValue> we are able to quickly find products in the collection in order to return, or update, the individual item.
With thousands of individual entities, all well under 1 MB, in the cache, bulk operations simply take too long. Too much overhead, latency in the network operations.
Edit: we're now considering maintaining both the entities and the monolithic collection of entities, because with the monolith, it appears that retrieving individual entities becomes a fairly expensive process with a production dataset.
I know the answer to this question for the most part is "It Depends", however I wanted to see if anyone had some pointers.
We execute queries each request in ASP.NET MVC. Each request we need to get user rights information, and Various data for the Views that we are displaying. How many is too much, I know I should be conscious to the number of queries i am executing. I would assume if they are small queries and optimized out, half-a-dozen should be okay? Am I right?
What do you think?
Premature optimization is the root of all evil :)
First create your application, if it is sluggish you will have to determine the cause and optimize that part. Sure reducing the queries will save you time, but also optimizing those queries that you have to do.
You could spend a whole day shaving off 50% time spend off a query, that only took 2 milisecond to begin with, or spend 2 hours on removing some INNER JOINS that made another query took 10 seconds. Analyse whats wrong before you start optimising.
The optimal amount would be zero.
Given that this is most likely not achievable, the only reasonable thing to say about is: "As little as possible".
Simplify your site design until it's
as simple as possible, and still
meeting your client's requirements.
Cache information that can be cached.
Pre-load information into the cache
outside the request, where you can.
Ask only for the information that you
need in that request.
If you need to make a lot of independant queries for a single request, parallelise the loading as much as possible.
What you're left with is the 'optimal' amount for that site.
If that's too slow, you need to review the above again.
User rights information may be able to be cached, as may other common information you display everywhere.
You can probably get away with caching more than the requirements necessitate. For instance - you can probably cache 'live' information such as product stock levels, and the user's shopping cart. Use SQL Change Notifications to allow you to expire and repopulate the cache in the background.
As few as possible.
Use caching for lookups. Also store some light-weight data (such as permissions) in the session.
Q: Do you have a performance problem related to database queries?
Yes? A: Fewer than you have now.
No? A: The exact same number you have now.
If it ain't broke, don't fix it.
While refactoring and optimizing to save a few milliseconds is a fun and intellectually rewarding way for programmers to spend time, it is often a waste of time.
Also, changing your code to combine database requests could come at the cost of simplicity and maintainability in your code. That is, while it may be technically possible to combine several queries into one, that could require removing the conceptual isolation of business objects in your code, which is bad.
You can make as many queries as you want, until your site gets too slow.
As many as necessary, but no more.
In other words, the performance bottlenecks will not come from the number of queries, but what you do in the queries and how you deal with the data (e.g. caching a huge yet static resultset might help).
Along with all the other recommendations of making fewer trips, it also depends on how much data is retrieved on each round trip. If it is just a few bytes, then it can probably be chatty and performance would not hurt. However, if each trip returns hundreds of kb, then your performance will hurt faster.
You have answered your own question "It depends".
Although, trying to justify optimal number number of queries per HTTP request is not a legible scenario. If your SQL server has real good hardware support than you could run good number of queries in less time and have real low turn around time for the HTTP request. So basically, "it depends" as you rightly said.
As the comments above indicate, some caching is likely appropriate for your situation. And like your question suggests, the real answer is "it depends." Generally, the fewer the queries, the better since each query has a cost associated with it. You should examine your data model and your application's requirements to determine what is appropriate.
For example, if a user's rights are likely to be static during the user's session, it makes sense to cache the rights data so fewer queries are required. If aspects of the data displayed in your View are also static for a user's session, these could also be cached.
I am designing a database and I would like to normalize the database. In one query I will joining about 30-40 tables. Will this hurt the website performance if it ever becomes extremely popular? This will be the main query and it will be getting called 50% of the time. The other queries I will be joining about two tables.
I have a choice right now to normalize or not to normalize but if the normalization becomes a problem in the future I may have to rewrite 40% of the software and it may take me a long time. Does normalization really hurt in this case? Should I denormalize now while I have the time?
I quote: "normalize for correctness, denormalize for speed - and only when necessary"
I refer you to: In terms of databases, is "Normalize for correctness, denormalize for performance" a right mantra?
HTH.
When performance is a concern, there are usually better alternatives than denormalization:
Creating appropriate indexes and statistics on the involved tables
Caching
Materialized views (Indexed views in MS SQL Server)
Having a denormalized copy of your tables (used exclusively for the queries that need them), in addition to the normalized tables that are used in most cases (requires writing synchronization code, that could run either as a trigger or a scheduled job depending on the data accuracy you need)
Normalization can hurt performance. However this is no reason to denormalize prematurely.
Start with full normalization and then you'll see if you have any performance problems. At the rate you are describing (1000 updates/inserts per day) I don't think you'll run into problems unless the tables are huge.
And even if there are tons of database optimization options (Indexes, Prepared stored procedures, materialized views, ...) that you can use.
Maybe I missing something here. But if your architecture requires you to join 30 to 40 tables in a single query, ad that query is the main use of your site then you have larger problems.
I agree with others, don't prematurely optimize your site. However, you should optimize your architecture to account for you main use case. a 40 table join for a query run over 50% of the time is not optimized IMO.
Don't make early optimizations. Denormalization isn't the only way to speed up a website. Your caching strategy is also quite important and if that query of 30-40 tables is of fairly static data, caching the results may prove to be a better optimization.
Also, take into account the number of writes to the number of reads. If you are doing approximately 10 reads for every insert or update, you could say that data is fairly static, hence you should cache it for some period of time.
If you end up denormalizing your schema, your writes will also become more expensive and potentially slow things down as well.
Really analyze your problem before making too many optimizations and also wait to see where your bottlenecks in the system really as you might end up being surprised as to what it is you should optimize in the first place.