Is application state (http://msdn.microsoft.com/en-us/library/ms178594.aspx) the same as using the System.Web.Caching API?
i.e.
System.web.httpcontent.current.cache[somekey] ?
The answer is there in your own link. Read it.
However, storing large blocks of data in application state can fill up server memory, causing the server to page memory to disk. As an alternative to using application state, you can use the ASP.NET cache mechanism for storing large amounts of application data. The ASP.NET cache also stores data in memory and is therefore very fast; however, ASP.NET actively manages the cache and will remove items when memory becomes scarce.
No, they are not the same.
The ASP.Net Cache object is specifically optimized for caching content or objects that are short lived or can live for a defined amount of time. It will be cannibalized by the garbage collector if resources need to be freed up, and it is never guaranteed that something you put in the Cache will be there the next time you look.
System.Application is a global collection of key-values that can be used to store information global to all users in a thread-safe way (as long as you utilize it safely). However, nothing will ever be removed from System.Application unless you explicitly remove it.
You probably mean System.Web.Context, not content, and the Cache is different from HttpApplicationState.
The application state is there for items that remain fairly static for the lifetime of the application (unless explicitly removed). As can be read in the page you have linked from, the recommendation is to use Application:
to store small amounts of often-used data that does not change from one user to another.
Cache is more transient in nature and is supposed to be used for content that has much shorter life (seconds to minutes) and removal of items is managed automatically (dependent on configuration).
It's not the same.
If the data
is stable during the life of the application
must always be available and must not be purged
you'd use HttpApplicationState.
If the data
not necessarily is needed for the life of the application
changes frequently
can be purged if needed (for example low system memory)
can be discarded if seldom used
should be invalidated/refreshed under some conditions (dependency rule: time span, date, file timestamp, ...)
then use Cache.
Other important differences:
Large amounts of data may better be stored in Cache, the server then can purge it if low on memory.
Cache is safe for multithreaded operations. Page.Application needs locking.
See also this article on etutorials.org for more details.
And this question:
ASP.NET Page.Cache versus Page.Application storage for data synchronization?
Actually I would say that the main purpose of Application state is backwards compatibility with classic ASP applications, and I would never use it in new ASP.NET apps.
As other respondents have indicated, the Cache is actively managed so that data will be discarded if memory is scarce.
Application state is essentially equivalent to a static Hashtable, with locking semantics that are inherited from classic ASP.
If you need to store static data, it's almost always better to store it as a strongly-typed static field of a class rather than using Application state. If you need locking, use the standard synchronisation mechanisms of .NET.
Related
In my application I use a dictionary (supporting adding, removing, updating and lookup) where both keys and values are or can be made serializable (values can possibly be quite large object graphs). I came to a point when the dictionary became so large that holding it completely in memory started to occasionally trigger OutOfMemoryException (sometimes in the dictionary methods, and sometimes in other parts of code).
After an attempt to completely replace the dictionary with a database, performance dropped down to an unacceptable level.
Analysis of the dictionary usage patterns showed that usually a smaller part of values are "hot" (are accessed quite often), and the rest (a larger part) are "cold" (accessed rarely or never). It is difficult to say when a new value is added if it will be hot or cold, moreover, some values may migrate back and forth between hot and cold parts over time.
I think that I need an implementation of a dictionary that is able to flush its cold values to a disk on a low memory event, and then reload some of them on demand and keep them in memory until the next low memory event occurs when their hot/cold status will be re-assessed. Ideally, the implementation should neatly adjust the sizes of its hot and cold parts and the flush interval depending on the memory usage profile in the application to maximize overall performance. Because several instances of a dictionary exist in the application (with different key/value types), I think, they might need to coordinate their workflows.
Could you please suggest how to implement such a dictionary?
Compile for 64 bit, deploy on 64 bit, add memory. Keep it in memory.
Before you grown your own you may alternatively look at WeakReference http://msdn.microsoft.com/en-us/library/ms404247.aspx. It would of course require you to rebuild those objects that were reclaimed but one should hope that those which are reclaimed are not used much. It comes with the caveat that its own guidleines state to avoid using weak references as an automatic solution to memory management problems. Instead, develop an effective caching policy for handling your application's objects.
Of course you can ignore that guideline and effectively work your code to account for it.
You can implement the caching policy and upon expiry save to database, on fetch get and cache. Use a sliding expiry of course since you are concerned with keeping those most used.
Do remember however that most used vs heaviest is a trade off. Losing an object 10 times a day that takes 5 minutes to restore would annoy users much more than losing an object 10000 times which tool just 5ms to restore.
And someone above mentioned the web cache. It does automatic memory management with callbacks as noted, depends if you want to lug that one around in your apps.
And...last but not least, look at a distributed cache. With sharding you can split that big dictionary across a few machines.
Just an idea - never did that and never used System.Runtime.Caching:
Implement a wrapper around MemoryCache which will:
Add items with an eviction callback specified. The callback will place evicted items to the database.
Fetch item from database and put back into MemoryCache if the item is absent in MemoryCache during retrieval.
If you expect a lot of request for items missing both in database and memory, you'll probably need to implement either bloom filter or cache keys for present/missing items also.
I have a similar problem in the past.
The concept you are looking for is a read through cache with a LRU (Least Recently Used or Most Recently Used) queue.
Is it there any LRU implementation of IDictionary?
As you add things to your dictionary keep track of which ones where used least recently, remove them from memory and persist those to disk.
I have an application which query the database for records. The records can be thousands in numbers and this can shoot up the memory of the process and eventually leads up to a crash or slow in response.
The paginated query is a solution for this but the information in the record always keep changing. Hence to give a unique experience, we are forced to show the information available at the time which user make the query.
Employing paging could dynamically update the content on moving from pages to page. I believe a client-side caching could solve this problem.
One way I am finding is to store the results in to disk in XML format and query using LINQ to XML. Are there any proven client side caching mechanism which can work with desktop application (not web)
See some pattern like http://msdn.microsoft.com/en-us/library/ff664753
It talks about the use of the Enterprise Library Caching Application Block that lets developers incorporate a local cache in their applications.
Read also http://www.codeproject.com/Articles/8977/Using-Cache-in-Your-WinForms-Applications
Enterprise Library 5.0 can be found here http://msdn.microsoft.com/en-us/library/ff632023
Memory usage shouldn't really be an issue unless you are letting your cache grow indefinitely. There is little benefit to pre-fetching too many pages the user may never see, or in holding on to pages that the user has not viewed for a long time. Dynamically fetching the next/previous page would keep performance high, but you should clear from the cache pages that have been edited or are older than a certain timespan. Clearing from the cache simply requires discarding all references to the page (e.g. removing it from any lists or dictionaries) and allowing the garbage collector to do its work.
You can also potentially store a WeakReference to your objects and let the garbage collector collect your objects if it needs to, but this gives you less control over what is an isn't cached.
Alternatively there are some very good third party solutions for this, especially if its a grid control. The DevExpress grid controls have an excellent server mode that can handle very large data sets with good performance.
I have inherited a project from a developer who was rather fond of session variables. He has used them to store all sorts of global stuff - datatables, datasets, locations of files, connection strings etc. I am a little worried that this may not be very scalable and we do have the possibility of a lot more users in the immediate future.
Am I right to be concerned, and if so why?
Is there an easy way to see how much memory this is all using on the live server at the moment?
What would be the best approach for re-factoring this to use a better solution?
Yes, I would say that you do have some cause for concern. Overuse of session can cause a lot of performance issues. Ideally, session should only be used for information that is specific to the user. Obviously there are exceptions to this rule, but keep that in mind when you're refactoring.
As for the refactoring itself, I would look into caching any large objects that are not user-specific, and removing anything that doesn't need to be in session. Don't be afraid to make a few trips to the database to retrieve information when you need it. Go with the option that puts the least overall strain on the server. The trick is keeping it balanced and distributing the weight as evenly as possible across the various layers of the application.
It was probably due to poor design, and yes you should be concerned if you plan on getting heavier traffic or scaling the site.
Connection strings should be stored in web.config. Seems like you would have to do some redesigning of the data-layer and how the pages pass data to each other to steer away from storing datatables and datasets in Session. For example, instead of storing a whole dataset in Session, store, or pass by url, something small (like an ID) that can be used to re-query the database.
Sessions always hurt scalability. However, once sessions are being used, the impact of a little bit more data in a session isn't that bad.
Still, it has to be stored somewhere, has to be retrieved from somewhere, so it's going to have an impact. It's going to really hurt if you have to move to a web-farm to deal with being very successful, since that's harder to do well in a scalable manner. I'd start by taking anything that should be global in the true sense (shared between all sessions) and move it into a truly globally-accessible location.
Then anything that depended upon the previous request, I'd have be sent by that request.
Doing both of those would reduce the amount they were used for immensely (perhaps enough to turn off sessions and get the massive scalability boost that gives).
Depending on the IIS version, using Session to store state can have an impact on scaling. The later versions of IIS are better.
However, the main problem I have run into is that sessions expire and then your data is lost; you may provide your own Session_OnEnd handler where it is possible to regenerate your session.
Overall yes, you should be concerned about this.
Session is a "per user" type of storage that is in memory. Looking at the memory usage of the ASP.NET Worker Process will give you an idea of memory usage, but you might need to use third-party tools if you want to dig in deeper to what is in. In addition session gets really "fun" when you start load balancing etc.
ConnectionStrings and other information that is not "per user" should really not be handled in a "per user" storage location.
As for creating a solution for this though, a lot is going to depend on the data itself, as you might need to find multiple other opportunities/locations to get/store the info.
You are right in feeling concerned about this.
Connection strings should be stored in Web.config and always read from there. The Web.config file is cached, so storing things in there and then on Session is redundant and unnecessary. The same can be said for locations of files: you can probably create key,value pairs in the appSettings section of your web.config to store this information.
As far as storing datasets, datatables, etc; I would only store this information on Session if getting them from the database is really expensive and provided the data is not too big. A lot of people tend to do this kind of thing w/o realizing that their queries are very fast and that database connections are pooled.
If getting the data from the database does take long, the first thing I would try to remedy would be the speed of my queries. Am I missing indexes? What does the execution plan of my queries show? Am I doing table scans, etc., etc.
One scenario where I currently store information on Session (or Cache) is when I do have to call an external web service that takes more than 2 secs on average to retrieve what I need. Once I get this data I don't need to getting again on every page hit, so I cache it.
Obviously an application that stores pretty much everything it can on Session is going to have scalability issues because memory is a limited resource.
if memory is the issue, why not change session mode to sql server so you can store session data in sql server which requires little code changes.
how to store session data in sql server:
http://msdn.microsoft.com/en-us/library/ms178586.aspx
the catch is that the classes stored in sql server must be serializable and you can use json.net to do just that.
We have an ASP.NET 4.0 application that draws from a database a complex data structure that takes over 12 hours to push into an in memory data structure (that is later stored in HttpRuntime.Cache). The size of the data structure is quickly increasing and we can't continue waiting 12+ hours to get it into memory if the application restarts. This is a major issue if you want to change the web.config or any code in the web application that causes a restart - it means a long wait before the application can be used, and hinders development or updating the deployment.
The data structure MUST be in memory to work at a speed that makes the website usable. In memory databases such as memcache or Redis are slow in comparison to HttpRuntime.Cache, and would not work in our situation (in memory db's have to serialize put/get, plus they can't reference each other, they use keys which are lookups - degrading performance, plus with a large amount of keys the performance goes down quickly). Performance is a must here.
What we would like to do is quickly dump the HttpRuntime.Cache to disk before the application ends (on a restart), and be able to load it back immediately when the application starts again (hopefully within minutes instead of 12+ hours or days).
The in-memory structure is around 50GB.
Is there a solution to this?
In memory databases such as memcache or Redis are slow in comparison to HttpRuntime.Cache
Yes, but they are very fast compared to a 12+ hour spin-up. Personally, I think you're taking the wrong approach here in forcing load of a 50 GB structure. Just a suggestion, but we use HttpRuntime.Cache as part of a multi-tier caching strategy:
local cache is checked etc first
otherwise redis is used as the next tier of cache (which is faster than the underlying data, persistent, and supports a number of app servers) (then local cache is updated)
otherwise, the underlying database is hit (and then both redis and local cache are updated)
The point being, at load we don't require anything in memory - it is filled as it is needed, and from then on it is fast. We also use pub/sub (again courtesy of redis) to ensure cache invalidation is prompt. The net result: it is fast enough when cold, and very fast when warm.
Basically, I would look at anything that avoids needing the 50GB data before you can do anything.
If this data isn't really cache, but is your data, I would look at serialization on a proper object model. I would suggest protobuf-net (I'm biased as the author) as a strong candidate here - very fast and very small output.
In C# I had do create my own dynamic memory management. For that reason I have created a static memory manager and a MappableObject. All object that should be dynamic mappable and unmappable from and to the harddisk implement this interface.
This memory management is only done for these large objects that have the ability to unmap/map the data from the harddisk. All other things use of course the regular GC.
Everytime a MappableObject is allocated it asks for memory. If no memory is available that the MemoryManager unmaps some data dynamically to the harddisk to get more memory to make it possible to allocate a new MappableObject.
A problem in my case is that I can have more than 100.000 MappableObject instances (scattered over a few files ~ 10-20 files) and everytime I have to run through a list of all objects if I need to unmap some data. Is there a way to get all allocated objects that are created in my current instance?
In fact I don't know what's easier to keep my own list or to run through the objects (if possible)? How would you solve such things?
Update
The reason is that I have a large amount of data. About 100GB of data that I need to keep during my run. Therefore I need the references on the data, and so the GC is not able to clean the memory. In fact C# manages the memory pretty well, but in such memory exhausting applications the GC gets really bad. Of course I tried to use the MemoryFailPoint, but this slows down my allocations tremendously and does not give correct results for whatever reason. I have also tried MemoryMappedFiles, but since I have to access the data randomly it doesn't help. Also MemoryMappedFiles only allow to have ~5000 file handles (on my system) and this is not enough.
Is there a ROT (Running Object Table) in .Net? The short answer is no.
You would have to maintain this information yourself.
Given your question update, could you not store your data in a database and use some sort of in-memory cache (perhaps with weak references or MFU, etc) to try and keep hot data close to you?
This is an obvious case for a classic cache. Your data is stored in a database or indexed flat file while you maintain a much smaller number of entries in RAM.
To implement a cache for your program I would create a class that implements IDictionary. Reserve a certain amount of slots in your cache, say a number of elements that would cause about 100 MB of RAM to be allocated; make this cache size an adjustable parameter.
When you override this[], if the object requested is in the cache, return it. If the object requested is not in the cache, remove the least recently used cached value, add the requested value as the most recently used value, and return it. Functions like Remove() and Add() not only adjust the memory cache, but also manipulate the underlying database or flat file on disk.
While it's true that your program might hold some references to objects you removed from the cache, if so, your program is still using them. Garbage collection will clean them up as needed.
Caches like this are easier to implement in C# because of its strong OOP features and safety.