I have a scenario to optimise how my web app is storing data in the session and retrieving it. I should point out that I'm using SQL Server as my session store.
My scenario is I need to store a list of unique IDs mapped to string values in the user's session for later use. The current code I've inherited is using a List<T> with a custom object but I can already see some kind of dictionary is far better for performance.
I've tested two ideas for alternatives:
Storing a Dictionary<int, string> in the session. When I need to get the strings back, I get the dictionary from the session once and can test each ID on the dictionary object.
Since the session is basically like a dictionary itself, store the string directly in the session using a unique session key e.g. Session["MyString_<id>"] = stringValue". Getting the value out of the session would basically be the inverse operation.
My test results show the following based on the operation I need to do and using 100 strings:
Dictionary - 4552 bytes, 0.1071 seconds to do operation
Session Direct - 4441 bytes, 0.0845 seconds to do operation
From these results I see that I save some space in the session (probably because I've not got the overhead of serialising a dictionary object) and it seems to be faster when getting the values back from the session, maybe because strings are faster to deserialise than objects.
So my question is, is it better for performance to store lots of smaller objects in session rather than one big one? Is there some disadvantage for storing lots of smaller objects vs. one bigger object that I haven't seen?
There are penalties for serializing and searching large objects (they take up more space and processor time due to the need to represent a more complex structure).
And why do 2 searches when you can do only one.
Also, all documentation that deal with caching/storing solutions mention that it is much more efficient to serialize a single value from a list based on a computed key, rather than store all the dictionary and retrieve that and search in it.
I think you have almost answered your own question in showing that that yes, there is an overhead with deserializing objects but I think the real reason should be one of manageability and maintainability.
The size of storage difference is going to be minimal when you are talking about 100 objects but as you scale this up to 1000's of objects the differences will increase too, especially if you are using complex custom objects. If you have an application that has many users all using 1000's of sessions then you can imagine how this is just not scalable.
Also, by having many session objects you are undoubtedly going to have to write more code to handle each varying object. This may not be a vast amount more, but certainly more. This would also potentially make it more difficult for a developer picking up your code to understand you reasoning etc and therefore extend your code.
If you can handle the session in a single barebones format like a IEnumerable or IDictionary then this in my opinion is preferable even if there is a slight overhead involved.
Related
Currently I am working on writing a caching system using Redis. Initially, I was storing a dictionary of as the value, and a combination of dates as the key. This worked great, but I was concerned about eventually hitting the size limit for the value. I tried using HashEntries, but it was incredibly inefficient. Now, I am trying storing each MyClass object separately, using a the original key with a the ID appended to it. However, when I am doing the retrieval from the cache, I need to be able to retrieve values where the key contains the date substring. I read that using Keys() is very slow which defeats the purpose of the cache.
I read here about using scan and cursors, but couldn't get the keys from the RedisResult.
I was hoping that somebody either could help me with the scan, show me a way to get the keys that doesn't hurt performance, or another idea for caching large lists of data.
Well, scanning caches is not something that would increase perfomance of application.
What would be suggested is to re-structure your cache to be able to address a single record by date.
Could Redis lists or sets be an option for you?
I need to cache information about user roles in ASP.NET Web API. I have decided to use System.Web.Helpers.WebCache class. Role is plain string, which is about 40 character long. Each user may have between 1-10 roles.
I am thinking of two ways to do this:
Use WebCache.Set(UserID, List<String>). Use user id as key and store List of roles (string) as value. Its easy to retrieve.
Use dictionary, where I will use userId as key and list of roles as value & then cache the dictionary. This way I am caching with only one key. When I retrieve this information, I first retrieve dictionary and then use user id to get the role information.
Questions:
Which approach is better? I like approach one as its easy to use. Does it have any downside?
The way I calculated memory use for keeping these keys into cache is by adding same amount of data (stored 10 roles of type string into) into a notepad and then calculated the size of the notepad (used UTF-8 encoding). The size was about 500 bytes and size of disk was 4 KB . Then if I have 200 users, I will multiply 200 * 500 bytes to calculate the memory usage. Is this right (I am ok if approximately closed) way to calculate?
I prefer the approach of saving individual keys instead of saving the roles of all users as a single cache object.
Following are the reasons:
1) Creation is simple, when user logs in or at an appropriate moment in time, the cache is checked for and 'if empty' created for that user, no need of iterating through the dictionary object (or LINQ) to get to that key item.
2) When user logs off or at an appropriate moment, the cache object is destroyed completely instead of only removing that particular key from cache.
3) Also no need of locking the object when multiple users are trying to access the object at the same time and this scenario will happen. Since object is created per user, there is no risk of locking that object or need to use synchronization or mutex.
Thanks, Praveen
1. Solution one is preferrable. It is straightforward and appears to only offer advantages.
2. Your calculation makes sense for option 1 but not for option 2. A C# dictionary using hashing takes up more memory, for primitive and short data like this, the data taken by hashes may be a significant increase.
The memory storage in individual bytes for this type of application would typically be a secondary concern compared to maintainability and functionality, this is because user roles are often a core functionality with fairly large security concerns and as the project grows it will become very important that the code is maintainable and secure.
Caching should be used exclusively as an optimization and because this is related to small amounts of data for a relatively small user base(~200 people) it would be much better to make your caching of these roles granular and easy to refetch.
According to the official documentation on this library
Microsoft system.web.helpers.webcache
In general, you should never count on an item that you have cached to be in the cache
And because I'll assume that user roles defines some fairly important functionality, it would be better to add queries for these roles to your web API requests instead of storing them locally.
However if you are dead set on using this cache and refetching should it ever disappear then according to your question, option one would be a preferrable choice.
This is because a list takes less memory and in this case appears to be more straight forward and i see no benefits from using a dictionary.
Dictionaries shine when you have large datasets and need speed, but for this scenario where all data is already being stored in memory and the data set is relatively small, a dictionary introduces complexity and higher memory requirements and not much else. Though the memory usage sounds negligible in either scenario on most modern devices and servers.
While a dictionary may sound compelling given your need to lookup roles by users, the WebCache class appears to already offer that funcitonality and thus an additional dictionary loses its appeal
Q1: Without knowing the actual usage of Cache items, it is difficult to draw the conclusion. Nonetheless, I think it all comes down to the design of the life spam for those items. If you want to retire them all in once for certain period and then query a new set of data, storing a ConcurrentDictionary which houses users and roles to WebCache is a easier managing solution to do so.
Otherwise, if you want to retire each entry according to certain event individually, approach one seems a rather straight forward answer. Just be mindful, if you choose approach two, use ConcurrentDictionary instead of Dictionary because the latter is not thread safe.
Q2: WebCache is fundamentally a IEnumerable>, thus it stores the key strings and the memory locations of each value, apart from the meta data of the objects. On the other hand, ConcurrentDictionary/Dictionary stores the hash codes of key strings and the memory locations of each value. While each key's byte[] length is very small, its hashcode could be slightly bigger than the size of the string. Otherwise, the sizes of HashCodes are very predictable and reasonably slim(around 10 bytes in my test). Every time when you add an entry, the size of the whole collection increment about 30 bytes. Of course this figure does not include the actual size of the value as it is irrelevant to the collection.
You can calculate the size of the string by using:
System.Text.Encoding.UTF8.GetByteCount(key);
You might also find it useful to write code to achieve the size of an object:
static long GetSizeOfObject(object obj)
{
using (var stream = new MemoryStream())
{
BinaryFormatter formatter = new BinaryFormatter();
formatter.Serialize(stream, obj);
return stream.Length;
}
}
First of all make sure there will be abstract layer and you can easyly change implementation if future.
I cant see any significant difference between this two approaches both of them use hashtable for search.
But second use search two times I suppose, when it serch dictionary in cache and when it search user in dictionary.
I whold recommend in addition
if users are huge amount , store not roles as strings but roles
Ids. if there are 1000-10000 of no sence to do it
List item
Do not forget
to clear cache record when user roles are updated
You don't need option 2, option 1 should suffice as all you need is key,list<string>.
Few points to consider in general before using caching:-
What is amount of data being cached.
What mode of caching are you using In Memory/Distributed.
How are you going to manage the cache.
If data being cached grows beyond threshold what is the fall over mechanism.
Cache has its pros and cons, In your scenario you have already done the payload analysis so I don't see any issue with option 1.
I am implementing caching using MemoryCache (.net 4.0) for caching global data which will be used by all users of my website.
My initial approach:
Store a KeyedCollection which would hold a collection of the Customer objects with a key to get a single object. This collection could have up to 250 such objects. Whenever the cache expires, I would rebuild the KeyedCollection and add it to the cache.
New approach
Now, I am thinking why not store each Customer object directly to the cache with the customerid as the look-up key. Therefore, MemoryCache.Default would have upto 250 such Customer objects versus a single KeyedCollection.
Benefits:
More efficient since I will get the Customer object directly from
the cache without having to perform another look-up on the Keyed
Collection.
I would add a new Customer object to the cache only when it is
requested for the first time. Sort of lazy addition as opposed to
pre-building the entire cache.
Any thoughts on using one versus the other in terms of performance and other factors?
The solution will depend on how often you need to work on the objects as a collection.
Reasons for storing as a collection:
Storing each object individually, if all 250 objects are always
populated, take up more space, as each item in cache would have an
associated CacheItemPolicy. This case is probably unlikely,
however.
You would not have strongly typed extension methods made
available by Linq on collections. (The extension methods are
available, but MemoryCache items are exposed as KeyValuePair<string,
object>).
Reasons for storing individually:
You are only or mostly going to be working one a single object at a time.
You want each object to be created and removed from cache based on its own frequency of usage, rather than that of a whole collection.
So, compare your likely usage scenario, and choose accordingly. Chances are, unless you are writing lots of .Where, .Select, etc, calls or have reason to pass around the whole collection, then storing individually is going to be the better choice.
I want to use a lookup map or dictionary in a C# application, but it is expected to store 1-2 GB of data.
Can someone please tell if I will still be able to use dictionary class, or if I need to use some other class?
EDIT : We have an existing application which uses oracle database to query or lookup object details. It is however too slow, since the same objects are getting repeatedly queried. I was feeling that it might be ideal to use a lookup map for this scenario, to improve the response time. However I am worried if size will make it a problem
Short Answer
Yes. If your machine has enough memory for the structure (and the overhead of the rest of the program and system including operating system).
Long Answer
Are you sure you want to? Without knowing more about your application, it's difficult to know what to suggest.
Where is the data coming from? A file? Files? A database? Services?
Is this a caching mechanism? If so, can you expire items out of the cache once they haven't been accessed for a while? This way, you don't have to hold everything in memory all the time.
As others have suggested, if you're just trying to store lots of data, can you just use a database? That way you don't have to have all of the information in memory at once. With indexing, most databases are excellent at performing fast retrieves. You could combine this approach with a cache.
Is the data that will be in memory read only, or will it have to be persisted back to some storage when something changes?
Scalability - do you expect that the amount of data that will be stored in this dictionary will increase as time goes on? If so, you're going to run into a point where it's very expensive to buy machines that can handle this amount of data. You might want to look a distributed caching system if this is the case (AppFrabric comes to mind) so you can scale out horizontally (more machines) instead of vertically (one really big expensive point of failure).
UPDATE
In light of the poster's edit, it sounds like caching would go a long way here. There are many ways to do this:
Simple dictionary caching - just cache stuff as its requested.
Memcache
Caching Application Block I'm not a huge fan of this implementation, but others have had success.
As long as you're on a 64GB machine, yes you should be able to use that large of a dictionary. However if you have THAT much data, a database may be more appropriate (cassandra is really nothing but a gigantic dictionary, and there's always MySQL).
When you say 1-2GB of data, I assume that you mean the items are complex objects that cumulatively contain 1-2GB.
Unless they're structs (and they shouldn't be), the dictionary doesn't care how big the items are.
As long as you have less than about 224 items (I pulled that number out of a hat), you can store as much as you can fit in memory.
However, as everyone else has suggested, you should probably use a database instead.
You may want to use an in-memory database such as SQL CE.
You can but For a Dictionary as large as that you are better off using a DataBase
Use a database.
Make sure you've a good DB model, put correct indexes, and off you go.
You can use subdictionaries.
Dictionary<KeyA, Dictionary<KeyB ....
Where KeyA is some common part of KeyB.
For example, if you have a String dictionary you can use the First letter as KeyA.
I have 5 types of objects: place info (14 properties),owner company info (5 properties), picture, ratings (stores multiple vote results), comments.
All those 5 objects will gather to make one object (Place) which will have all the properties and information about all the Place's info, pictures, comments, etc
What I'm trying to achieve is to have a page that displays the place object and all it's properties. another issue, if I want to display the Owner Companies' profiles I'll have object for each owner company (but I'll add a sixth property which is a list of all the places they own)
I've been practicing for a while, but I never got into implementing and performance experience, but I sensed that it was too much!
What do you think ?
You have to examine the use case scenarios for your solution. Do you need to always show all of the data, or are you starting off with displaying only a portion of it? Are users likely to expand any collapsed items as part of regular usage or is this information only used in less common usages?
Depending on your answers it may be best to fetch and populate the entire page with all of the data at once, or it may be the case that only some data is needed to render the initial screen and the rest can be fetched on-demand.
In most cases the best solution is likely to involve fetching only the required data and to update the page dynamically using ajax queries as needed.
As for optimizing data access, you need to strike a balance between the number of database requests and the complexity of each individual request. Because of network latency it is often important to fetch as much as possible using as few queries as possible, even if this means you'll sometimes be fetching data that you do not always need. But if you include too much data in a single query, then computing all the joins may also be costly. It is quite rare to see a solution in which it is better to first fetch all root objects and then for every element go fetch some additional objects associated with that element. As such, design your solution to fetch all data at once, but include only what you really need and try to keep the number of involved tables to a minimum.
You have 3 issues to deal with really, and they are often split into DAL, BLL and UI
Your objects obviously belong in the BLL and if you're considering performance then you need to consider how your objects will be created and how they interface to the DAL. I have many objects with 50-200 properties so 14 properties is really no issue.
The UI side of it is seperate, and if you're considering the performance of displaying a lot of information onto a single page you'll consider tabbed content, grids etc.
Tackle it one thing at a time and see where your problems lie.