I have a static class which handles the cache read/write for frequently used data.
The code is this:
public static T GetFromCache<T>(double seconds, string cacheId, Func<T> method) where T : class
{
HttpContext ctx = HttpContext.Current;
object temp = null;
temp = ctx.Cache[cacheId];
if (temp == null)
{
lock (Sync)
{
temp = ctx.Cache[cacheId];
if (temp == null)
{
temp = method.Invoke();
AddToCache(temp as T, seconds, cacheId);
return temp as T;
}
}
}
if (temp is T)
{
return (T)temp;
}
return null;
}
The code is used by various callers to read data from and write data to the cache.
Now I have a Sync object (private static readonly object Sync = new object();) which gets locked when data gets written to the cache.
As this code is called by multiple callers, I would like to create a List of Sync objects, one for each caller. (with caller I don't mean the user, but calling code. I then would identify a caller by the signature of the parameter method)
The reason I want this is that every piece of calling code can have it's own lock object; otherwise (I think) every call to this cachecontroller from different callers will use the same lock object. Then, the caching for the list of countries will also lock the caching of the list of states, and with two different lock objects, they will not be in each others way.
I would then use the CacheItemRemovedCallback method to remove the lockitems from the list.
The question is this: How can I do that?
By having one Sync object for each user will defy the purpose of synchronization as each use will hold its own lock and there will be a chance that for the same cacheId you will end up invoking the method multiple times. This might result in data becoming inconsistent.
If you wish to keep one Sync object per user then it's good to make use of session variables or per user cache or something similar.. otherwise each user will virtually end up messing up with each other's cacheId results.
If you have a scenario when there can be Multiple readers of data but at a time a single user can write it then try using ReaderWriterLockSlim. This is very fast compared to lock in a multi user scenario.
Update1
Considering the cacheId is unique and not common among the callers. You can use the following code.
No lock is needed here. Reason, HttpContext.Cache is ThreadSafe. Meaning, you can read/save values to Cache. But, if the value reference itself is being shared among more than one concurrent calls, then please synchronize it.
public static T GetFromCache<T>(double seconds, string cacheId, Func<T> method) where T : class
{
HttpContext ctx = HttpContext.Current;
object temp = null;
temp = ctx.Cache[cacheId];
if (temp == null)
{
temp = method.Invoke();
AddToCache(temp as T, seconds, cacheId);
return temp as T;
}
}
Regards
Related
I have a web method upload Transaction (ASMX web service) that take the XML file, validate the file and store the file content in SQL server database. we noticed that a certain users can submit the same file twice at the same time. so we can have the same codes again in our database( we cannot use unique index on the database or do anything on database level, don't ask me why). I thought I can use the lock statement on the user id string but i don't know if this will solve the issue. or if I can use a cashed object for storing all user id requests and check if we have 2 requests from the same user Id we will execute the first one and block the second request with an error message
so if anyone have any idea please help
Blocking on strings is bad. Blocking your webserver is bad.
AsyncLocker is a handy class that I wrote to allow locking on any type that behaves nicely as a key in a dictionary. It also requires asynchronous awaiting before entering the critical section (as opposed to the normal blocking behaviour of locks):
public class AsyncLocker<T>
{
private LazyDictionary<T, SemaphoreSlim> semaphoreDictionary =
new LazyDictionary<T, SemaphoreSlim>();
public async Task<IDisposable> LockAsync(T key)
{
var semaphore = semaphoreDictionary.GetOrAdd(key, () => new SemaphoreSlim(1,1));
await semaphore.WaitAsync();
return new ActionDisposable(() => semaphore.Release());
}
}
It depends on the following two helper classes:
LazyDictionary:
public class LazyDictionary<TKey,TValue>
{
//here we use Lazy<TValue> as the value in the dictionary
//to guard against the fact the the initializer function
//in ConcurrentDictionary.AddOrGet *can*, under some conditions,
//run more than once per key, with the result of all but one of
//the runs being discarded.
//If this happens, only uninitialized
//Lazy values are discarded. Only the Lazy that actually
//made it into the dictionary is materialized by accessing
//its Value property.
private ConcurrentDictionary<TKey, Lazy<TValue>> dictionary =
new ConcurrentDictionary<TKey, Lazy<TValue>>();
public TValue GetOrAdd(TKey key, Func<TValue> valueGenerator)
{
var lazyValue = dictionary.GetOrAdd(key,
k => new Lazy<TValue>(valueGenerator));
return lazyValue.Value;
}
}
ActionDisposable:
public sealed class ActionDisposable:IDisposable
{
//useful for making arbitrary IDisposable instances
//that perform an Action when Dispose is called
//(after a using block, for instance)
private Action action;
public ActionDisposable(Action action)
{
this.action = action;
}
public void Dispose()
{
var action = this.action;
if(action != null)
{
action();
}
}
}
Now, if you keep a static instance of this somewhere:
static AsyncLocker<string> userLock = new AsyncLocker<string>();
you can use it in an async method, leveraging the delights of LockAsync's IDisposable return type to write a using statement that neatly wraps the critical section:
using(await userLock.LockAsync(userId))
{
//user with userId only allowed in this section
//one at a time.
}
If we need to wait before entering, it's done asynchronously, freeing up the thread to service other requests, instead of blocking until the wait is over and potentially messing up your server's performance under load.
Of course, when you need to scale to more than one webserver, this approach will no longer work, and you'll need to synchronize using a different means (probably via the DB).
I have to query in my company's CRM Solution(Oracle's Right Now) for our 600k users, and update them there if they exist or create them in case they don't. To know if the user already exists in Right Now, I consume a third party WS. And with 600k users this can be a real pain due to the time it takes each time to get a response(around 1 second). So I managed to change my code to use Parallel.ForEach, querying each record in just 0,35 seconds, and adding it to a List<User> of records to be created or to be updated (Right Now is kinda dumb so I need to separate them in 2 lists and call 2 distinct WS methods).
My code used to run perfectly before multithread, but took too long. The problem is that I can't make a batch too large or I get a timeout when I try to update or create via Web Service. So I'm sending them around 500 records at once, and when it runs the critical code part, it executes many times.
Parallel.ForEach(boDS.USERS.AsEnumerable(), new ParallelOptions { MaxDegreeOfParallelism = -1 }, row =>
{
...
user = null;
user = QueryUserById(row["USER_ID"].Trim());
if (user == null)
{
isUpdate = false;
gObject.ID = new ID();
}
else
{
isUpdate = true;
gObject.ID = user.ID;
}
... fill user attributes as generic fields ...
gObject.GenericFields = listGenericFields.ToArray();
if (isUpdate)
listUserUpdate.Add(gObject);
else
listUserCreate.Add(gObject);
if (i == batchSize - 1 || i == (boDS.USERS.Rows.Count - 1))
{
UpdateProcessingOptions upo = new UpdateProcessingOptions();
CreateProcessingOptions cpo = new CreateProcessingOptions();
upo.SuppressExternalEvents = false;
upo.SuppressRules = false;
cpo.SuppressExternalEvents = false;
cpo.SuppressRules = false;
RNObject[] results = null;
// <Critical_code>
if (listUserCreate.Count > 0)
{
results = _service.Create(_clientInfoHeader, listUserCreate.ToArray(), cpo);
}
if (listUserUpdate.Count > 0)
{
_service.Update(_clientInfoHeader, listUserUpdate.ToArray(), upo);
}
// </Critical_code>
listUserUpdate = new List<RNObject>();
listUserCreate = new List<RNObject>();
}
i++;
});
I thought about using lock or mutex, but it isn't gonna help me, since they will just wait to execute afterwards. I need some solution to execute only ONCE in only ONE thread that part of code. Is it possible? Can anyone share some light?
Thanks and kind regards,
Leandro
As you stated in the comments you're declaring the variables outside of the loop body. That's where your race conditions originate from.
Let's take variable listUserUpdate for example. It's accessed randomly by parallel executing threads. While one thread is still adding to it, e.g. in listUserUpdate.Add(gObject); another thread could already be resetting the lists in listUserUpdate = new List<RNObject>(); or enumerating it in listUserUpdate.ToArray().
You really need to refactor that code to
make each loop run as independent from each other as you can by moving variables inside the loop body and
access data in a synchronizing way using locks and/or concurrent collections
You can use the Double-checked locking pattern. This is usually used for singletons, but you're not making a singleton here so generic singletons like Lazy<T> do not apply.
It works like this:
Separate out your shared data into some sort of class:
class QuerySharedData {
// All the write-once-read-many fields that need to be shared between threads
public QuerySharedData() {
// Compute all the write-once-read-many fields. Or use a static Create method if that's handy.
}
}
In your outer class add the following:
object padlock;
volatile QuerySharedData data
In your thread's callback delegate, do this:
if (data == null)
{
lock (padlock)
{
if (data == null)
{
data = new QuerySharedData(); // this does all the work to initialize the shared fields
}
}
}
var localData = data
Then use the shared query data from localData By grouping the shared query data into a subordinate class you avoid the necessity of making its individual fields volatile.
More about volatile here: Part 4: Advanced Threading.
Update my assumption here is that all the classes and fields held by QuerySharedData are read-only once initialized. If this is not true, for instance if you initialize a list once but add to it in many threads, this pattern will not work for you. You will have to consider using things like Thread-Safe Collections.
There are a great number of articles available regarding thread safe caching, here's an example:
private static object _lock = new object();
public void CacheData()
{
SPListItemCollection oListItems;
oListItems = (SPListItemCollection)Cache["ListItemCacheName"];
if(oListItems == null)
{
lock (_lock)
{
// Ensure that the data was not loaded by a concurrent thread
// while waiting for lock.
oListItems = (SPListItemCollection)Cache[“ListItemCacheName”];
if (oListItems == null)
{
oListItems = DoQueryToReturnItems();
Cache.Add("ListItemCacheName", oListItems, ..);
}
}
}
}
However, this example depends on the request for the cache also rebuilding the cache.
I'm looking for a solution where the request and rebuild are separate. Here's the scenario.
I have a web service that I want to monitor for certain types of error. If an error occurs, I create an monitor object and cache - it is updatable and is locked accordingly during update. Alls well so far.
Elsewhere, I check for the existence of the cached object, and the data it contains. This would work straight out of the box except for one particular scenario.
If the cache object is being updated - say a status change, I would like to wait and get the latest info rather than the current info, which if returned, would be out of date. So for my fetch code, I need to check if the object is currently being created/updating, and if so wait, then retry.
As I pointed out, there are many examples of cache locking patterns but I can't seem to find one that for this scenario. Any ideas as to how to go about this would be appreciated?
You can try the following code using two locks. Write lock in the setter is quite simple and protects cache from being written by more than one threads. The getter use a simple double-check lock.
Now, the trick is in Refresh() method, which uses the same lock as the getter. The method uses the lock and in the first step removes list from the cache. It will trigger any getter to fail the first null check and wait for the lock. The method in the meantime gets items, sets cache again and releases the lock.
When it comes back to the getter, it reads the cache again and now it contains the list.
public class CacheData
{
private static object _readLock = new object();
private static object _writeLock = new object();
public SPListItemCollection ListItem
{
get
{
var oListItems = (SPListItemCollection) Cache["ListItemCacheName"];
if (oListItems == null)
{
lock (_readLock)
{
oListItems = (SPListItemCollection)Cache["ListItemCacheName"];
if (oListItems == null)
{
oListItems = DoQueryToReturnItems();
Cache.Add("ListItemCacheName", oListItems, ..);
}
}
}
return oListItems;
}
set
{
lock (_writeLock)
{
Cache.Add("ListItemCacheName", value, ..);
}
}
}
public void Refresh()
{
lock (_readLock)
{
Cache.Remove("ListItemCacheName");
var oListItems = DoQueryToReturnItems();
ListItem = oListItems;
}
}
}
You can make the method and property static, if you do not need CacheData instance.
I have a number of static List's in my application, which are used to store data from my database and are used when looking up information:
public static IList<string> Names;
I also have some methods to refresh this data from the database:
public static void GetNames()
{
SQLEngine sql = new SQLEngine(ConnectionString);
lock (Names)
{
Names = sql.GetDataTable("SELECT * FROM Names").ToList<string>();
}
}
I initially didnt have the lock() in place, however i noticed very occasionally, the requesting thread couldnt find the information in the list. Now, I am assuming that if the requesting thread tries to access the Names list, it cant until it has been fully updated.
Is this the correct methodology and usage of the lock() statement?
As a sidenote, i noticed on MSDN that one shouldnt use lock() on public variables. Could someone please elaborate in my particular scenario?
lock is only useful if all places intended to be synchronized also apply the lock. So every time you access Names you would be required to lock. At the moment, that only stops 2 threads swapping Names at the same time, which frankly isn't a problem here anyway, as reference swaps are atomic anyway.
Another problem; presumably Names starts off null? You can't lock a null. Equally, you shouldn't lock on something that may change reference. If you want to synchronize, a common approach is something like:
// do not use for your scenario - see below
private static readonly object lockObj = new object();
then lock(lockObj) instead of your data.
With regards to not locking things that are visible externally; yes. That is because some other code could randomly choose to lock on it, which could cause unexpected blocking, and quite possibly deadlocks.
The other big risk is that some of your code obtains the names, and then does a sort/add/remove/clear/etc - anything that mutates the data. Personally, I would be using a read-only list here. In fact, with a read-only list, all you have is a reference swap; since that is atomic, you don't need any locking:
public static IList<string> Names { get; private set; }
public static void UpdateNames() {
List<string> tmp = SomeSqlQuery();
Names = tmp.AsReadOnly();
}
And finally: public fields are very very rarely a good idea. Hence the property above. This will be inlined by the JIT, so it is not a penalty.
No, it's not correct since anyone can use the Names property directly.
public class SomeClass
{
private List<string> _names;
private object _namesLock = new object();
public IEnumerable<string> Names
{
get
{
if (_names == null)
{
lock (_namesLock )
{
if (_names == null)
_names = GetNames();
}
}
return _names;
}
}
public void UpdateNames()
{
lock (_namesLock)
GetNames();
}
private void GetNames()
{
SQLEngine sql = new SQLEngine(ConnectionString);
_names = sql.GetDataTable("SELECT * FROM Names").ToList<string>();
}
}
Try to avoid static methods. At least use a singleton.
The check, lock, check is faster than a lock, check since the write will only occur once.
Assigning a property on usage is called lazy loading.
The _namesLock is required since you can't lock on null.
From the oode you have shown, the first time GetNames() is called the Names property is null. I don't known what a lock on a null object would do. I would add a variable to lock on.
static object namesLock = new object();
Then in GetNames()
lock (namesLock)
{
if (Names == null)
Names = ...;
}
We do the if test inside of the lock() to stop race conditions. I'm assuming that the caller of GetNames() also does the same test.
Ok I was a little unsure on how best name this problem :) But assume this scenarion, you're
going out and fetching some webpage (with various urls) and caching it locally. The cache part is pretty easy to solve even with multiple threads.
However, imagine that one thread starts fetching an url, and a couple of milliseconds later another want to get the same url. Is there any good pattern for making the seconds thread's method wait on the first one to fetch the page , insert it into the cache and return it so you don't have to do multiple requests. With little enough overhead that it's worth doing even for requests that take about 300-700 ms? And without locking requests for other urls
Basically when requests for identical urls comes in tightly after each other I want the second request to "piggyback" the first request
I had some loose idea of having a dictionary where you insert an object with the key as url when you start fetching a page and lock on it. If there's any matching the key already it get's the object, locks on it and then tries to fetch the url for the actual cache.
I'm a little unsure of the particulars however to make it really thread-safe, using ConcurrentDictionary might be one part of it...
Is there any common pattern and solutions for scenarios like this?
Breakdown wrong behavior:
Thread 1: Checks the cache, it doesnt exists so starts fetching the url
Thread 2: Starts fetching the same url since it still doesn't exist in Cache
Thread 1: finished and inserts into the cache, returns the page
Thread 2: Finishes and also inserts into cache (or discards it), returns the page
Breakdown correct behavior:
Thread 1: Checks the cache, it doesnt exists so starts fetching the url
Thread 2: Wants the same url, but sees it's currently being fetched so waits on thread 1
Thread 1: finished and inserts into the cache, returns the page
Thread 2: Notices that thread 1 is finished and returns the page thread 1 it fetched
EDIT
Most solutions sofar seem to misunderstand the problem and only addressing the caching, as I said that isnt the problem, the problem is when doing an external web fetch to make the second fetch that is done before the first one has cached it to use the result from the first rather then doing a second
You could use a ConcurrentDictionary<K,V> and a variant of double-checked locking:
public static string GetUrlContent(string url)
{
object value1 = _cache.GetOrAdd(url, new object());
if (value1 == null) // null check only required if content
return null; // could legitimately be a null string
var urlContent = value1 as string;
if (urlContent != null)
return urlContent; // got the content
// value1 isn't a string which means that it's an object to lock against
lock (value1)
{
object value2 = _cache[url];
// at this point value2 will *either* be the url content
// *or* the object that we already hold a lock against
if (value2 != value1)
return (string)value2; // got the content
urlContent = FetchContentFromTheWeb(url); // todo
_cache[url] = urlContent;
return urlContent;
}
}
private static readonly ConcurrentDictionary<string, object> _cache =
new ConcurrentDictionary<string, object>();
EDIT: My code is quite a bit uglier now, but uses a separate lock per URL. This allows different URLs to be fetched asynchronously, however each URL will only be fetched once.
public class UrlFetcher
{
static Hashtable cache = Hashtable.Synchronized(new Hashtable());
public static String GetCachedUrl(String url)
{
// exactly 1 fetcher is created per URL
InternalFetcher fetcher = (InternalFetcher)cache[url];
if( fetcher == null )
{
lock( cache.SyncRoot )
{
fetcher = (InternalFetcher)cache[url];
if( fetcher == null )
{
fetcher = new InternalFetcher(url);
cache[url] = fetcher;
}
}
}
// blocks all threads requesting the same URL
return fetcher.Contents;
}
/// <summary>Each fetcher locks on itself and is initilized with null contents.
/// The first thread to call fetcher.Contents will cause the fetch to occur, and
/// block until completion.</summary>
private class InternalFetcher
{
private String url;
private String contents;
public InternalFetcher(String url)
{
this.url = url;
this.contents = null;
}
public String Contents
{
get
{
if( contents == null )
{
lock( this ) // "this" is an instance of InternalFetcher...
{
if( contents == null )
{
contents = FetchFromWeb(url);
}
}
}
return contents;
}
}
}
}
Will the Semaphore please stand up! stand up! stand up!
use Semaphore you can easily synchronize your threads with it.
on both cases where
you are trying to load a page that is currently being cached
you are saving cache to a file where a page is loading from it.
in both scenarios you will face troubles.
it is just like writers and readers problem that is a common problem in Operating System Racing Issues. just when a thread wants to rebuild a cache or start caching a page no thread should read from it. if a thread is reading it it should wait until reading finished and replace the cache, no 2 threads should cache same page in to a same file. hence it is possible for all readers to read from a cache at anytime since no writer is writing on it.
you should read some semaphore using samples on msdn, it is very easy to use. just the thread that wants to do something is call the semaphore and if the resource can granted it do the works otherwise sleeps and wait to be woken up when the resource is ready.
Disclaimer: This might be a n00bish answer. Please pardon me, if it is.
I'd recommend using some shared dictionary object with locks to keep a track of the url being currently fetched or have already been fetched.
At every request, check the url against this object.
If an entry for the url is present, check the cache. (this means one of the threads has either fetched it or is currently fetching it)
If its available in the cache, use it, else put the current thread to sleep for a while and check back again. (if not in cache, some thread is still fetching it, so wait while its done)
If the entry is not found in the dictionary object, add the url to it and send the request. Once it obtains a response, add it to cache.
This logic should work, however, you would need to take care of cache expiration and removal of the entry from the dictionary object.
my solution is use atomicBoolean to control access database when cache is timeout or unexist;
at the same moment, only one thread(i call it read-th) can access database, the other threads spin until the read-th return data and write it into cache;
here codes; implement by java;
public class CacheBreakDownDefender<K, R> {
/**
* false = do not write null to cache when get null value from database;
*/
private final boolean writeNullToCache;
/**
* cache different query key
*/
private final ConcurrentHashMap<K, AtomicBoolean> selectingDBTagMap = new ConcurrentHashMap<>();
public static <K, R> CacheBreakDownDefender<K, R> getInstance(Class<K> keyType, Class<R> resultType) {
return Singleton.get(keyType.getName() + resultType.getName(), () -> new CacheBreakDownDefender<>(false));
}
public static <K, R> CacheBreakDownDefender<K, R> getInstance(Class<K> keyType, Class<R> resultType, boolean writeNullToCache) {
return Singleton.get(keyType.getName() + resultType.getName(), () -> new CacheBreakDownDefender<>(writeNullToCache));
}
private CacheBreakDownDefender(boolean writeNullToCache) {
this.writeNullToCache = writeNullToCache;
}
public R readFromCache(K key, Function<K, ? extends R> getFromCache, Function<K, ? extends R> getFromDB, BiConsumer<K, R> writeCache) throws InterruptedException {
R result = getFromCache.apply(key);
if (result == null) {
final AtomicBoolean selectingDB = selectingDBTagMap.computeIfAbsent(key, x -> new AtomicBoolean(false));
if (selectingDB.compareAndSet(false, true)) {
try {
result = getFromDB.apply(key);
if (result != null || writeNullToCache) {
writeCache.accept(key, result);
}
} finally {
selectingDB.getAndSet(false);
selectingDBTagMap.remove(key);
}
} else {
while (selectingDB.get()) {
TimeUnit.MILLISECONDS.sleep(0L);
//do nothing...
}
return getFromCache.apply(key);
}
}
return result;
}
public static void main(String[] args) throws InterruptedException {
Map<String, String> map = new ConcurrentHashMap<>();
CacheBreakDownDefender<String, String> instance = CacheBreakDownDefender.getInstance(String.class, String.class, true);
for (int i = 0; i < 9; i++) {
int finalI = i;
new Thread(() -> {
String kele = null;
try {
if (finalI == 6) {
kele = instance.readFromCache("kele2", map::get, key -> "helloword2", map::put);
} else
kele = instance.readFromCache("kele", map::get, key -> "helloword", map::put);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
log.info("resut= {}", kele);
}).start();
}
TimeUnit.SECONDS.sleep(2L);
}
}
This is not exactly for concurrent caches but for all caches:
"A cache with a bad policy is another name for a memory leak" (Raymond Chen)