I need to cache information about user roles in ASP.NET Web API. I have decided to use System.Web.Helpers.WebCache class. Role is plain string, which is about 40 character long. Each user may have between 1-10 roles.
I am thinking of two ways to do this:
Use WebCache.Set(UserID, List<String>). Use user id as key and store List of roles (string) as value. Its easy to retrieve.
Use dictionary, where I will use userId as key and list of roles as value & then cache the dictionary. This way I am caching with only one key. When I retrieve this information, I first retrieve dictionary and then use user id to get the role information.
Questions:
Which approach is better? I like approach one as its easy to use. Does it have any downside?
The way I calculated memory use for keeping these keys into cache is by adding same amount of data (stored 10 roles of type string into) into a notepad and then calculated the size of the notepad (used UTF-8 encoding). The size was about 500 bytes and size of disk was 4 KB . Then if I have 200 users, I will multiply 200 * 500 bytes to calculate the memory usage. Is this right (I am ok if approximately closed) way to calculate?
I prefer the approach of saving individual keys instead of saving the roles of all users as a single cache object.
Following are the reasons:
1) Creation is simple, when user logs in or at an appropriate moment in time, the cache is checked for and 'if empty' created for that user, no need of iterating through the dictionary object (or LINQ) to get to that key item.
2) When user logs off or at an appropriate moment, the cache object is destroyed completely instead of only removing that particular key from cache.
3) Also no need of locking the object when multiple users are trying to access the object at the same time and this scenario will happen. Since object is created per user, there is no risk of locking that object or need to use synchronization or mutex.
Thanks, Praveen
1. Solution one is preferrable. It is straightforward and appears to only offer advantages.
2. Your calculation makes sense for option 1 but not for option 2. A C# dictionary using hashing takes up more memory, for primitive and short data like this, the data taken by hashes may be a significant increase.
The memory storage in individual bytes for this type of application would typically be a secondary concern compared to maintainability and functionality, this is because user roles are often a core functionality with fairly large security concerns and as the project grows it will become very important that the code is maintainable and secure.
Caching should be used exclusively as an optimization and because this is related to small amounts of data for a relatively small user base(~200 people) it would be much better to make your caching of these roles granular and easy to refetch.
According to the official documentation on this library
Microsoft system.web.helpers.webcache
In general, you should never count on an item that you have cached to be in the cache
And because I'll assume that user roles defines some fairly important functionality, it would be better to add queries for these roles to your web API requests instead of storing them locally.
However if you are dead set on using this cache and refetching should it ever disappear then according to your question, option one would be a preferrable choice.
This is because a list takes less memory and in this case appears to be more straight forward and i see no benefits from using a dictionary.
Dictionaries shine when you have large datasets and need speed, but for this scenario where all data is already being stored in memory and the data set is relatively small, a dictionary introduces complexity and higher memory requirements and not much else. Though the memory usage sounds negligible in either scenario on most modern devices and servers.
While a dictionary may sound compelling given your need to lookup roles by users, the WebCache class appears to already offer that funcitonality and thus an additional dictionary loses its appeal
Q1: Without knowing the actual usage of Cache items, it is difficult to draw the conclusion. Nonetheless, I think it all comes down to the design of the life spam for those items. If you want to retire them all in once for certain period and then query a new set of data, storing a ConcurrentDictionary which houses users and roles to WebCache is a easier managing solution to do so.
Otherwise, if you want to retire each entry according to certain event individually, approach one seems a rather straight forward answer. Just be mindful, if you choose approach two, use ConcurrentDictionary instead of Dictionary because the latter is not thread safe.
Q2: WebCache is fundamentally a IEnumerable>, thus it stores the key strings and the memory locations of each value, apart from the meta data of the objects. On the other hand, ConcurrentDictionary/Dictionary stores the hash codes of key strings and the memory locations of each value. While each key's byte[] length is very small, its hashcode could be slightly bigger than the size of the string. Otherwise, the sizes of HashCodes are very predictable and reasonably slim(around 10 bytes in my test). Every time when you add an entry, the size of the whole collection increment about 30 bytes. Of course this figure does not include the actual size of the value as it is irrelevant to the collection.
You can calculate the size of the string by using:
System.Text.Encoding.UTF8.GetByteCount(key);
You might also find it useful to write code to achieve the size of an object:
static long GetSizeOfObject(object obj)
{
using (var stream = new MemoryStream())
{
BinaryFormatter formatter = new BinaryFormatter();
formatter.Serialize(stream, obj);
return stream.Length;
}
}
First of all make sure there will be abstract layer and you can easyly change implementation if future.
I cant see any significant difference between this two approaches both of them use hashtable for search.
But second use search two times I suppose, when it serch dictionary in cache and when it search user in dictionary.
I whold recommend in addition
if users are huge amount , store not roles as strings but roles
Ids. if there are 1000-10000 of no sence to do it
List item
Do not forget
to clear cache record when user roles are updated
You don't need option 2, option 1 should suffice as all you need is key,list<string>.
Few points to consider in general before using caching:-
What is amount of data being cached.
What mode of caching are you using In Memory/Distributed.
How are you going to manage the cache.
If data being cached grows beyond threshold what is the fall over mechanism.
Cache has its pros and cons, In your scenario you have already done the payload analysis so I don't see any issue with option 1.
I'm trying to use a pattern to retrieve all keys matching a pattern by Stackexchange.Redis.
Code
KEYS *o*
On the project homepage is linked Where are KEYS, SCAN, FLUSHDB etc? which gives full details on how to access this, and why it isn't on IDatabase. I should point out that you should avoid KEYS on a production server. The library will automatically try to use SCAN instead if it is available - which is less harmful but should still be treated with some caution. It would be preferable to explicitly store related keys in a set or hash.
Generally speaking, Azure Table IO performance improves as more partitions are used (with some tradeoffs in continuation tokens and batch updates I won't go into).
Since the partition key is always a string I am considering using a "natural" load balancing technique based on a subset of the GetHashCode() of the partition key, and appending this subset to the partition key itself. This will allow all direct PK/RK queries to be computed with little overhead and with ease. Batch updates may just need an intermediate to group similar PKs together prior to submission.
Question:
Should I use GetHashCode() to compute the partition key? Is a better function available?
If I use GetHashCode() does it matter which character I use for my PK?
Is there an abstraction for Azure Table and Blob storage that does this for me already?
No, don't use GetHashCode as its value is only guaranteed to be stable in the current AppDomain. Otherwise, it can change anytime.
Use a hash function which you control or which is standardized. Google has put out a set of hashes for this purpose including "murmur hash".
What should you partition (and hash) on? That depends on your query patterns. It absolutely cannot be answered without looking at your query patterns. In general, try to partition on something that is a predicate in almost all of your queries.
What s the best way to implement a method that creates and assings ID s to user on a asp.net application?
I was thinking about using DateTime ticks and thread id
I wanna make sure that there is no collision and user ids are unique.
ID can be a string or long.
should i use MD5 on some information that i collect from user? what would that be?
I have seen that md5 collision rate is very low.
I would use GUIDs based off the limited information you've given.
The simplest solution is an autoincremented number. This requires a central server.
Date/time plus a one-way hash are for pseudo-random IDs. Do they have to be pseudo random for security? This should not be relied upon for uniqueness because by definition one-way hashes collide. You'd still need a central server to check for duplicates before issuing the ID.
GUIDs are best if the IDs are created in a distributed system (no central server to generate the ID). GUIDs can be generated on separate machines, and they shouldn't collide. Depends on the implementation, but some GUID algorithms are simply pseudo-random, and yes, there is still a possibility of collision.
Guid is by far the best choice for generating unique ids for something like a userid. They are absolutely guaranteed to be unique globally (hence the name). In order to best work with a clustered index you should use NEWSEQUENTIALID(). This generates sequential ids that can be appended to the index, and prevents sql server having to reorganise and page the index every time a value is added. There is a small security concern associated with using this function in that the next value in the sequence can be determined.
We are using HttpRuntime.Cache API in an ASP.NET to cache data retrieved from a database.
For this particular application, our database queries feature a LOT of parameters, so our cache keys look something like this:
table=table1;param1=somevalue1;param2=somevalue2;param3=somevalue3;param4=somevalue4;param5=somevalue5;param6=somevalue6... etc...
For some queries, we have so many parameters that the cache key is several hundred characters long.
My question: is there a limit to the length of these cache keys? Internally, it is using a dictionary, so theoretically the lookup time should be constant. However, I wonder if we have potential to run into some performance/memory problem.
Internally, Dictionary uses the hash code of the key you give it. Effectively every key is stored as an integer.
You have nothing to worry about.