ASP.NET MVC Caching scenario - c#

I'm still yet to find a decent solution to my scenario. Basically I have an ASP.NET MVC website which has a fair bit of database access to make the views (2-3 queries per view) and I would like to take advantage of caching to improve performance.
The problem is that the views contain data that can change irregularly, like it might be the same for 2 days or the data could change several times in an hour.
The queries are quite simple (select... from where...) and not huge joins, each one returns on average 20-30 rows of data (with about 10 columns).
The queries are quite simple at the sites current stage, but over time the owner will be adding more data and the visitor numbers will increase. They are large at the moment and I would be looking at caching as traffic will mostly be coming from Google AdWords etc and fast loading pages will be a benefit (apparently).
The site will be hosted on a Microsoft SQL Server 2005 database (But can upgrade to 2008 if required).
Do I either:
Set the caching to the minimum time an item doesn't change for (E.g. cache for say 3 mins) and tell the owner that any changes will take upto 3 minutes to appear?
Find a way to force the cache to clear and reprocess on changes (E.g. if the owner adds an item in the administration panel it clears the relevant caches)
Forget caching all together
Or is there an option that would be suit this scenario?

If you are using Sql Server, there's also another option to consider:
Use the SqlCacheDependency class to have your cache invalidated when the underlying data is updated. Obviously this achieves a similar outcome to option 2.
I might actually have to agree with Agileguy though - your query descriptions seem pretty simplistic. Thinking forward and keeping caching in mind while you design is a good idea, but have you proven that you actually need it now? Option 3 seems a heck of a lot better than option 1, assuming you aren't actually dealing with significant performance problems right now.

Premature optimization is the root of all evil ;)
That said, if you are going to Cache I'd use a solution based around option 2.
You have less opportunity for "dirty" data in that manner.
Kindness,
Dan

2nd option is the best. Shouldn't be so hard if the same app edits/caches data. Can be more tricky if there is more than one app.
If you can't go that way, 1st might be acceptable too. With some tweaks (i.e. - i would try to update cache silently on another thread when it hits timeout) it might work well enough (if data are allowed to be a bit old).
Never drop caching if it's possible. Everyone knows "premature optimization..." verse, but caching is one of those things that can increase scalability/performance of application dramatically.

Related

Caching big data in .NET Core Web API

I have a Web API that provides complex statistical/forecast data. I have one endpoint that can take even 20s to complete, so I started looking at caching to boots the performance. My situation is very different from these described in many examples, so I need help.
Long story short, the method returns a batch of forecasts and statistics for item. For single item, it's as quick as 50ms, that's good. But there is also a method (very complex) that needs 2000-3000 items AT ONCE, to calculate different statistics. And this is a problem.
There are probably around 250,000 items in the database, around 200M rows in one table. The good part is: Table only updates ONCE per day and I would need around 1GB of data (around 80M "optimized" rows).
So my idea was, once per day (I know exactly when) the API would query, transform, optimize and put into memory 1GB of data from that table, and during the day, it will be lighting fast.
My question is, is it a good idea? Should I use some external provider (like Memcached or Redis) or just a singleton list with proper locking using semaphores etc?
If Memcache, how can I do it? I don't want to cache this table "as is". It's too big. I need to do some transformation first.
Thanks!
This is a good solution if you are not limited by server RAM, imo. Since it's .Net Core you can try System.Runtime.Caching.MemoryCache

MsgSetRequest - Can I throw too much data at it?

I'm working on a C# library project that will process transactions between SQL and QuickBooks Enterprise, keeping both data stores in sync. This is great and all, but the initial sync is going to be a fairly large set of transactions. Once the initial sync is complete, transactions will sync as needed for the remainder of the life of the product.
At this point, I'm fairly familiar with the SDK using QBFC, as well as all of the various resources and sample code available via the OSR, the ZOMBIE project by Paul Keister (thanks, Paul!) and others. All of these resources have been a huge help. But one thing I haven't come across yet is whether there is a limit or substantial or deadly performance cost associated with large amounts of data via a single Message Set Request. As I understand it, the database on QuickBooks' end is just a SQL database as well, but I don't want to make any assumptions.
Again, I just need to hit this hard once, so I don't want to engineer a separate solution to do the import. This also affords me an opportunity to test a copy of live data against my library, logs and all.
For what it's worth, this is my first ever post on Stack, so feel free to educate me on posting here if I've steered off course in any way. Thanks.
For what it's worth, I found that in a network environment (as opposed to everything happening on 1 box) it's better to have a larger MsgSetRequest as opposed to a smaller one. Of course everything has its limits, and maybe I just never hit it. I don't remember exactly how big the request set was, but it was big. The performance improvement was easily 10 to 1 or better.
If I was you, I'd build some kind of iteration into my design from the beginning (to iterate through your SQL data set). Start with a big number that will do it all at once, and if that breaks just scale it back until you find something that works.
I know this answer doesn't have the detail you're looking for, but hopefully it will help.

How many SQL queries per HTTP request is optimal?

I know the answer to this question for the most part is "It Depends", however I wanted to see if anyone had some pointers.
We execute queries each request in ASP.NET MVC. Each request we need to get user rights information, and Various data for the Views that we are displaying. How many is too much, I know I should be conscious to the number of queries i am executing. I would assume if they are small queries and optimized out, half-a-dozen should be okay? Am I right?
What do you think?
Premature optimization is the root of all evil :)
First create your application, if it is sluggish you will have to determine the cause and optimize that part. Sure reducing the queries will save you time, but also optimizing those queries that you have to do.
You could spend a whole day shaving off 50% time spend off a query, that only took 2 milisecond to begin with, or spend 2 hours on removing some INNER JOINS that made another query took 10 seconds. Analyse whats wrong before you start optimising.
The optimal amount would be zero.
Given that this is most likely not achievable, the only reasonable thing to say about is: "As little as possible".
Simplify your site design until it's
as simple as possible, and still
meeting your client's requirements.
Cache information that can be cached.
Pre-load information into the cache
outside the request, where you can.
Ask only for the information that you
need in that request.
If you need to make a lot of independant queries for a single request, parallelise the loading as much as possible.
What you're left with is the 'optimal' amount for that site.
If that's too slow, you need to review the above again.
User rights information may be able to be cached, as may other common information you display everywhere.
You can probably get away with caching more than the requirements necessitate. For instance - you can probably cache 'live' information such as product stock levels, and the user's shopping cart. Use SQL Change Notifications to allow you to expire and repopulate the cache in the background.
As few as possible.
Use caching for lookups. Also store some light-weight data (such as permissions) in the session.
Q: Do you have a performance problem related to database queries?
Yes? A: Fewer than you have now.
No? A: The exact same number you have now.
If it ain't broke, don't fix it.
While refactoring and optimizing to save a few milliseconds is a fun and intellectually rewarding way for programmers to spend time, it is often a waste of time.
Also, changing your code to combine database requests could come at the cost of simplicity and maintainability in your code. That is, while it may be technically possible to combine several queries into one, that could require removing the conceptual isolation of business objects in your code, which is bad.
You can make as many queries as you want, until your site gets too slow.
As many as necessary, but no more.
In other words, the performance bottlenecks will not come from the number of queries, but what you do in the queries and how you deal with the data (e.g. caching a huge yet static resultset might help).
Along with all the other recommendations of making fewer trips, it also depends on how much data is retrieved on each round trip. If it is just a few bytes, then it can probably be chatty and performance would not hurt. However, if each trip returns hundreds of kb, then your performance will hurt faster.
You have answered your own question "It depends".
Although, trying to justify optimal number number of queries per HTTP request is not a legible scenario. If your SQL server has real good hardware support than you could run good number of queries in less time and have real low turn around time for the HTTP request. So basically, "it depends" as you rightly said.
As the comments above indicate, some caching is likely appropriate for your situation. And like your question suggests, the real answer is "it depends." Generally, the fewer the queries, the better since each query has a cost associated with it. You should examine your data model and your application's requirements to determine what is appropriate.
For example, if a user's rights are likely to be static during the user's session, it makes sense to cache the rights data so fewer queries are required. If aspects of the data displayed in your View are also static for a user's session, these could also be cached.

ASP.NET Lucene Performance Improvements question

I have coded up an ASP.NET website and running on win'08 (remotely hosted). The application queries 11 very large Lucene indexes (each ~100GB). I open IndexSearchers on Page_load() and keep them open for the duration of the user session.
My questions:
The queries take a ~5 seconds to complete - understandable these are very large indexes - but users want faster responses. I was curious to squeeze out better performance. ( I did look over the Apache Lucene website and try some of the ideas over there). Interested in if & how you tweaked it further, especially ones from asp.net perspective.
One ideas was to use Solr instead of querying Lucene directly. But that seems counter-intuitive, introducing another abstraction in between and might add to the latency. Is it worth the headache in porting to Solr? Can anyone share some metrics on what improvement you got following a switch to Solr if it has been worth it.
Are there some key things that could be done in Solr that could be replicated to speed up response times?
Some questions / ideas:
Are you hitting all 11 indexes for a single request?
Can you reorganize the indexes so that you hit only 1 index (i.e. sharding) ?
Have you run a profile of the application (using dotTrace or similar tool)? Where is the time spent? Lucene.Net?
If most of the time is spent on Lucene.Net, then if you migrate to Solr the latency should be negligible (compared to the rest of the spent time). Plus, Solr can be easily distributed to increase performance.
I'm not all too familiar with Lucene (I use Solr) but if you're searching 11 indexes per request, can you run those searches in parallel (e.g. with TPL) ?
The biggest thing is removing the search from the web tier, and isolating it to it's own tier (a search tier). That way, you have a dedicated box with dedicated resources that have the indexes loaded, and "warmed up" in cache, instead of having each user have a copy of it's own index reader.

Caching architecture for search results in an ASP.NET application

What is a good design for caching the results of an expensive search in an ASP.NET system?
Any ideas would be welcomed ... particularly those that don't require inventing a complex infrastructure of our own.
Here are some general requirements related to the problem:
Each search result can produce include from zero to several hundred result records
Each search is relatively expensive and timeconsuming to execute (5-15 seconds at the database)
Results must be paginated before being displayed at the client to avoid information overload for the user
Users expect to be able to sort, filter, and search within the results returned
Users expect to be able to quickly switch between pages in the search results
Users expect to be able to select multiple items (via checkbox) on any number of pages
Users expect relatively snappy performance once a search has finished
I see some possible options for where and how to implement caching:
1. Cache on the server (in session or App cache), use postbacks or Ajax panels to facilitate efficient pagination, sorting, filtering, and searching.
PROS: Easy to implement, decent support from ASP.NET infrastructure
CONS: Very chatty, memory intensive on server, data may be cached longer than necessary; prohibits load balancing practices
2. Cache at the server (as above) but using serializeable structures that are moved out of memory after some period of time to reduce memory pressure on the server
PROS: Efficient use of server memory; ability to scale out using load balancing;
CONS: Limited support from .NET infrastructure; potentially fragile when data structures change; places additional load on the database; significantly more complicated
3. Cache on the client (using JSON or XML serialization), use client-side Javascript to paginate, sort, filter, and select results.
PROS: User experience can approach "rich client" levels; most browsers can handle JSON/XML natively - decent libraries exist for manipulation (e.g. jQuery)
CONS: Initial request may take a long time to download; significant memory footprint on client machines; will require hand-crafted Javascript at some level to implement
4. Cache on the client using a compressed/encoded representation of the data - call back into server to decode when switching pages, sorting, filtering, and searching.
PROS: Minimized memory impact on server; allows state to live as long as client needs it; slightly improved memory usage on client over JSON/XML
CONS: Large data sets moving back and forth between client/server; slower performance (due to network I/O) as compared with pure client-side caching using JSON/XML; much more complicated to implement - limited support from .NET/browser
5. Some alternative caching scheme I haven't considered...
For #1, have you considered using a state server (even SQL server) or a shared cache mechanism? There are plenty of good ones to choose from, and Velocity is getting very mature - will probably RTM soon. A cache invalidation scheme that is based on whether the user creates a new search, hits any other page besides search pagination, and finally a standard timeout (20 minutes) should be pretty successful at weeding your cache down to a minimal size.
References:
SharedCache (FOSS)
NCache ($995/CPU)
StateServer (~$1200/server)
StateMirror ("Enterprise pricing")
Velocity (Free?)
If you are able to wait until March 2010, .NET 4.0 comes with a new System.Caching.CacheProvider, which promises lots of implementations (disk, memory, SQL Server/Velocity as mentioned).
There's a good slideshow of the technology here. However it is a little bit of "roll your own" or a lot of it infact. But there will probably be a lot of closed and open source providers being written for the Provider model when the framework is released.
For the six points you state, a few questions crops up
What is contained in the search results? Just string data or masses of metadata associated with each result?
How big is the set you're searching?
How much memory would you use storing the entire set in RAM? Or atleast having a cache of the most popular 10 to 100 search terms. Also being smart and caching related searches after the first search might be another idea.
5-15 seconds for a result is a long time to wait for a search so I'm assuming it's something akin to an expedia.com search where multiple sources are being queried and lots of information returned.
From my limited experience, the biggest problem with the client-side only caching approach is Internet Explorer 6 or 7. Server only and HTML is my preference with the entire result set in the cache for paging, expiring it after some sensible time period. But you might've tried this already and seen the server's memory getting eaten.
Raising an idea under the "alternative" caching scheme. This doesn't answer your question with a given cache architecture, but rather goes back to your original requirements of your search application.
Even if/when you implement your own cache, it's effectiveness can be less than optimal -- especially as your search index grows in size. Cache hit rates will decrease as your index grows. At a certain inflection point, your search may actually slow down due to resources dedicated to both searching and caching.
Most search sub-systems implement their own internal caching architecture as a means of efficiency in operation. Solr, an open-source search system built on Lucene, maintains its own internal cache to provide for speedy operation. There are other search systems that would work for you, and they take similar strategies to results caching.
I would recommend you consider a separate search architecture if your search index warrants it, as caching in a free-text keyword search basis is a complex operation to effectively implement.
Since you say any ideas are welcome:
We have been using the enterprise library caching fairly successfully for caching result sets from a LINQ result.
http://msdn.microsoft.com/en-us/library/cc467894.aspx
It supports custom cache expiration, so should support most of your needs (with a little bit of custom code) there. It also has quite a few backing stores including encrypted backing stores if privacy of searches is important.
It's pretty fully featured.
My recommendation is a combination of #1 and #3:
Cache the query results on the server.
Make the results available as both a full page and as a JSON view.
Cache each page retrieved dynamically at the client, but send a REQUEST each time the page changes.
Use ETAGs to do client cache invalidation.
Have a look at SharedCache- it makes 1/2 pretty easy and works fine in a load balanced system. Free, open source, and we've been using it for about a year with no issues.
While pondering your options, consider that no user wants to page through data. We force that on them as an artifact of trying to build applications on top of browsers in HTML, which inherently do not scale well. We have invented all sorts of hackery to fake application state on top of this, but it is essentially a broken model.
So, please consider implementing this as an actual rich client in Silverlight or Flash. You will not beat that user experience, and it is simple to cache data much larger than is practical in a regular web page. Depending on the expected user behavior, your overall bandwidth could be optimized because the round trips to the server will get only a tight data set instead of any ASP.NET overhead.

Categories