For starters let me just throw it out there that I know the code below is not thread safe (correction: might be). What I am struggling with is finding an implementation that is and one that I can actually get to fail under test. I am refactoring a large WCF project right now that needs some (mostly) static data cached and its populated from a SQL database. It needs to expire and "refresh" at least once a day which is why I am using MemoryCache.
I know that the code below should not be thread safe but I cannot get it to fail under heavy load and to complicate matters a google search shows implementations both ways (with and without locks combined with debates whether or not they are necessary.
Could someone with knowledge of MemoryCache in a multi threaded environment let me definitively know whether or not I need to lock where appropriate so that a call to remove (which will seldom be called but its a requirement) will not throw during retrieval/repopulation.
public class MemoryCacheService : IMemoryCacheService
{
private const string PunctuationMapCacheKey = "punctuationMaps";
private static readonly ObjectCache Cache;
private readonly IAdoNet _adoNet;
static MemoryCacheService()
{
Cache = MemoryCache.Default;
}
public MemoryCacheService(IAdoNet adoNet)
{
_adoNet = adoNet;
}
public void ClearPunctuationMaps()
{
Cache.Remove(PunctuationMapCacheKey);
}
public IEnumerable GetPunctuationMaps()
{
if (Cache.Contains(PunctuationMapCacheKey))
{
return (IEnumerable) Cache.Get(PunctuationMapCacheKey);
}
var punctuationMaps = GetPunctuationMappings();
if (punctuationMaps == null)
{
throw new ApplicationException("Unable to retrieve punctuation mappings from the database.");
}
if (punctuationMaps.Cast<IPunctuationMapDto>().Any(p => p.UntaggedValue == null || p.TaggedValue == null))
{
throw new ApplicationException("Null values detected in Untagged or Tagged punctuation mappings.");
}
// Store data in the cache
var cacheItemPolicy = new CacheItemPolicy
{
AbsoluteExpiration = DateTime.Now.AddDays(1.0)
};
Cache.AddOrGetExisting(PunctuationMapCacheKey, punctuationMaps, cacheItemPolicy);
return punctuationMaps;
}
//Go oldschool ADO.NET to break the dependency on the entity framework and need to inject the database handler to populate cache
private IEnumerable GetPunctuationMappings()
{
var table = _adoNet.ExecuteSelectCommand("SELECT [id], [TaggedValue],[UntaggedValue] FROM [dbo].[PunctuationMapper]", CommandType.Text);
if (table != null && table.Rows.Count != 0)
{
return AutoMapper.Mapper.DynamicMap<IDataReader, IEnumerable<PunctuationMapDto>>(table.CreateDataReader());
}
return null;
}
}
The default MS-provided MemoryCache is entirely thread safe. Any custom implementation that derives from MemoryCache may not be thread safe. If you're using plain MemoryCache out of the box, it is thread safe. Browse the source code of my open source distributed caching solution to see how I use it (MemCache.cs):
https://github.com/haneytron/dache/blob/master/Dache.CacheHost/Storage/MemCache.cs
While MemoryCache is indeed thread safe as other answers have specified, it does have a common multi threading issue - if 2 threads try to Get from (or check Contains) the cache at the same time, then both will miss the cache and both will end up generating the result and both will then add the result to the cache.
Often this is undesirable - the second thread should wait for the first to complete and use its result rather than generating results twice.
This was one of the reasons I wrote LazyCache - a friendly wrapper on MemoryCache that solves these sorts of issues. It is also available on Nuget.
As others have stated, MemoryCache is indeed thread safe. The thread safety of the data stored within it however, is entirely up to your using's of it.
To quote Reed Copsey from his awesome post regarding concurrency and the ConcurrentDictionary<TKey, TValue> type. Which is of course applicable here.
If two threads call this [GetOrAdd] simultaneously, two instances of TValue can easily be constructed.
You can imagine that this would be especially bad if TValue is expensive to construct.
To work your way around this, you can leverage Lazy<T> very easily, which coincidentally is very cheap to construct. Doing this ensures if we get into a multithreaded situation, that we're only building multiple instances of Lazy<T> (which is cheap).
GetOrAdd() (GetOrCreate() in the case of MemoryCache) will return the same, singular Lazy<T> to all threads, the "extra" instances of Lazy<T> are simply thrown away.
Since the Lazy<T> doesn't do anything until .Value is called, only one instance of the object is ever constructed.
Now for some code! Below is an extension method for IMemoryCache which implements the above. It arbitrarily is setting SlidingExpiration based on a int seconds method param. But this is entirely customizable based on your needs.
Note this is specific to .netcore2.0 apps
public static T GetOrAdd<T>(this IMemoryCache cache, string key, int seconds, Func<T> factory)
{
return cache.GetOrCreate<T>(key, entry => new Lazy<T>(() =>
{
entry.SlidingExpiration = TimeSpan.FromSeconds(seconds);
return factory.Invoke();
}).Value);
}
To call:
IMemoryCache cache;
var result = cache.GetOrAdd("someKey", 60, () => new object());
To perform this all asynchronously, I recommend using Stephen Toub's excellent AsyncLazy<T> implementation found in his article on MSDN. Which combines the builtin lazy initializer Lazy<T> with the promise Task<T>:
public class AsyncLazy<T> : Lazy<Task<T>>
{
public AsyncLazy(Func<T> valueFactory) :
base(() => Task.Factory.StartNew(valueFactory))
{ }
public AsyncLazy(Func<Task<T>> taskFactory) :
base(() => Task.Factory.StartNew(() => taskFactory()).Unwrap())
{ }
}
Now the async version of GetOrAdd():
public static Task<T> GetOrAddAsync<T>(this IMemoryCache cache, string key, int seconds, Func<Task<T>> taskFactory)
{
return cache.GetOrCreateAsync<T>(key, async entry => await new AsyncLazy<T>(async () =>
{
entry.SlidingExpiration = TimeSpan.FromSeconds(seconds);
return await taskFactory.Invoke();
}).Value);
}
And finally, to call:
IMemoryCache cache;
var result = await cache.GetOrAddAsync("someKey", 60, async () => new object());
Check out this link: http://msdn.microsoft.com/en-us/library/system.runtime.caching.memorycache(v=vs.110).aspx
Go to the very bottom of the page (or search for the text "Thread Safety").
You will see:
^ Thread Safety
This type is thread safe.
As mentioned by #AmitE at the answer of #pimbrouwers, his example is not working as demonstrated here:
class Program
{
static async Task Main(string[] args)
{
var cache = new MemoryCache(new MemoryCacheOptions());
var tasks = new List<Task>();
var counter = 0;
for (int i = 0; i < 10; i++)
{
var loc = i;
tasks.Add(Task.Run(() =>
{
var x = GetOrAdd(cache, "test", TimeSpan.FromMinutes(1), () => Interlocked.Increment(ref counter));
Console.WriteLine($"Interation {loc} got {x}");
}));
}
await Task.WhenAll(tasks);
Console.WriteLine("Total value creations: " + counter);
Console.ReadKey();
}
public static T GetOrAdd<T>(IMemoryCache cache, string key, TimeSpan expiration, Func<T> valueFactory)
{
return cache.GetOrCreate(key, entry =>
{
entry.SetSlidingExpiration(expiration);
return new Lazy<T>(valueFactory, LazyThreadSafetyMode.ExecutionAndPublication);
}).Value;
}
}
Output:
Interation 6 got 8
Interation 7 got 6
Interation 2 got 3
Interation 3 got 2
Interation 4 got 10
Interation 8 got 9
Interation 5 got 4
Interation 9 got 1
Interation 1 got 5
Interation 0 got 7
Total value creations: 10
It seems like GetOrCreate returns always the created entry. Luckily, that's very easy to fix:
public static T GetOrSetValueSafe<T>(IMemoryCache cache, string key, TimeSpan expiration,
Func<T> valueFactory)
{
if (cache.TryGetValue(key, out Lazy<T> cachedValue))
return cachedValue.Value;
cache.GetOrCreate(key, entry =>
{
entry.SetSlidingExpiration(expiration);
return new Lazy<T>(valueFactory, LazyThreadSafetyMode.ExecutionAndPublication);
});
return cache.Get<Lazy<T>>(key).Value;
}
That works as expected:
Interation 4 got 1
Interation 9 got 1
Interation 1 got 1
Interation 8 got 1
Interation 0 got 1
Interation 6 got 1
Interation 7 got 1
Interation 2 got 1
Interation 5 got 1
Interation 3 got 1
Total value creations: 1
Just uploaded sample library to address issue for .Net 2.0.
Take a look on this repo:
RedisLazyCache
I'm using Redis cache but it also failover or just Memorycache if Connectionstring is missing.
It's based on LazyCache library that guarantees single execution of callback for write in an event of multi threading trying to load and save data specially if the callback are very expensive to execute.
The cache is threadsafe, but like others have stated, its possible that GetOrAdd will call the func multiple types if call from multiple types.
Here is my minimal fix on that
private readonly SemaphoreSlim _cacheLock = new SemaphoreSlim(1);
and
await _cacheLock.WaitAsync();
var data = await _cache.GetOrCreateAsync(key, entry => ...);
_cacheLock.Release();
Related
The app needs to load data and cache it for a period of time. I would expect that if multiple parts of the app want to access the same cache key at the same time, the cache should be smart enough to only load the data once and return the result of that call to all callers. However, MemoryCache is not doing this. If you hit the cache in parallel (which often happens in the app) it creates a task for each attempt to get the cache value. I thought that this code would achieve the desired result, but it doesn't. I would expect the cache to only run one GetDataAsync task, wait for it to complete, and use the result to get the values for other calls.
using Microsoft.Extensions.Caching.Memory;
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
namespace ConsoleApp4
{
class Program
{
private const string Key = "1";
private static int number = 0;
static async Task Main(string[] args)
{
var memoryCache = new MemoryCache(new MemoryCacheOptions { });
var tasks = new List<Task>();
tasks.Add(memoryCache.GetOrCreateAsync(Key, (cacheEntry) => GetDataAsync()));
tasks.Add(memoryCache.GetOrCreateAsync(Key, (cacheEntry) => GetDataAsync()));
tasks.Add(memoryCache.GetOrCreateAsync(Key, (cacheEntry) => GetDataAsync()));
await Task.WhenAll(tasks);
Console.WriteLine($"The cached value was: {memoryCache.Get(Key)}");
}
public static async Task<int> GetDataAsync()
{
//Simulate getting a large chunk of data from the database
await Task.Delay(3000);
number++;
Console.WriteLine(number);
return number;
}
}
}
That's not what happens. The above displays these results (not necessarily in this order):
2
1
3
The cached value was: 3
It creates a task for each cache request and discards the values returned from the other two.
This needlessly spends time and it makes me wonder if you can say this class is even thread-safe. ConcurrentDictionary has the same behaviour. I tested it and the same thing happens.
Is there a way to achieve the desired behaviour where the task doesn't run 3 times?
MemoryCache leaves it to you to decide how to handle races to populate a cache key. In your case you don't want multiple threads to compete to populate a key presumably because it's expensive to do that.
To coordinate the work of multiple threads like that you need a lock, but using a C# lock statement in asynchronous code can lead to thread pool starvation. Fortunately, SemaphoreSlim provides a way to do async locking so it becomes a matter of creating a guarded memory cache that wraps an underlying IMemoryCache.
My first solution only had a single semaphore for the entire cache putting all cache population tasks in a single line which isn't very smart so instead here is more elaborate solution with a semaphore for each cache key. Another solution could be to have a fixed number of semaphores picked by a hash of the key.
sealed class GuardedMemoryCache : IDisposable
{
readonly IMemoryCache cache;
readonly ConcurrentDictionary<object, SemaphoreSlim> semaphores = new();
public GuardedMemoryCache(IMemoryCache cache) => this.cache = cache;
public async Task<TItem> GetOrCreateAsync<TItem>(object key, Func<ICacheEntry, Task<TItem>> factory)
{
var semaphore = GetSemaphore(key);
await semaphore.WaitAsync();
try
{
return await cache.GetOrCreateAsync(key, factory);
}
finally
{
semaphore.Release();
RemoveSemaphore(key);
}
}
public object Get(object key) => cache.Get(key);
public void Dispose()
{
foreach (var semaphore in semaphores.Values)
semaphore.Release();
}
SemaphoreSlim GetSemaphore(object key) => semaphores.GetOrAdd(key, _ => new SemaphoreSlim(1));
void RemoveSemaphore(object key)
{
if (semaphores.TryRemove(key, out var semaphore))
semaphore.Dispose();
}
}
If multiple threads try to populate the same cache key only a single thread will actually do it. The other threads will instead return the value that was created.
Assuming that you use dependency injection, you can let GuardedMemoryCache implement IMemoryCache by adding a few more methods that forward to the underlying cache to modify the caching behavior throughout your application with very few code changes.
There are different solutions available, the most famous of which is probably LazyCache: it's a great library.
Another one that you may find useful is FusionCache ⚡🦥, which I recently released: it has the exact same feature (although implemented differently) and much more.
The feature you are looking for is described here and you can use it like this:
var result = await fusionCache.GetOrSetAsync(
Key,
_ => await GetDataAsync(),
TimeSpan.FromMinutes(2)
);
You may also find some of the other features interesting, like fail-safe, advanced timeouts with background factory completion and support for an optional, distributed 2nd level.
If you will give it a chance please let me know what you think.
/shameless-plug
I've got an ASP.NET Core HTTP-server running in .NET 5. A library that I'm using needs to be initialized once per thread. Ideally, I'd be able to add some kind of callback so that I can call the initialization code when the ASP.NET web server starts a thread. Does such a thing exist?
The reason for this is that I need to make calls into some old code in the OCaml runtime, and OCaml requires each thread to be registered to call into the OCaml runtime. I'm currently doing this once per request, but I want to do this as cheaply as possible instead.
Update: looks like ASP.NET uses the default .NET threadpool. Don't know what to do with this info yet, but if there's a way to run this callback on all threads in the threadpool, that would work for me.
This can be an expensive problem and will not scale well (at all) if the initialization has any sort of resource allocation. However, there are many ways to acheive this, i.e. a concurrent dictionary of thread id, or another novel thread safe solution might be to use ThreadLocal.
Nonsensical Example
This is a contrived example, it's over-baked to only show that it works and is thread safe:
private static readonly ThreadLocal<bool> ThreadLocal = new ThreadLocal<bool>(() =>
{
Thread.Sleep(100);
// dll.init
return true;
});
private static bool Check()
{
if (!ThreadLocal.IsValueCreated)
{
Console.WriteLine("starting thread : " + Thread.CurrentThread.ManagedThreadId);
return ThreadLocal.Value;
}
Console.WriteLine("Already Started : " + Thread.CurrentThread.ManagedThreadId);
return false;
}
Test
for (int i = 0; i < 10; i++)
Task.Run(Check);
Console.ReadKey();
Output
starting thread : 8
starting thread : 4
starting thread : 5
starting thread : 6
starting thread : 7
starting thread : 9
starting thread : 10
starting thread : 11
Already Started : 4
Already Started : 6
Update per comment
Essentially ThreadLocal runs once and only once per thread.
To take this a step further, you could create a per request middleware class and add it to your pipeline:
public class CustomMiddleware
{
private static readonly ThreadLocal<bool> ThreadLocal = new ThreadLocal<bool>(() =>
{
// dll.init
// return anything you like
return true;
});
private readonly RequestDelegate _next;
public CustomMiddleware(RequestDelegate next) => _next = next;
public async Task Invoke(HttpContext httpContext)
{
// use the value if you need, do anything you like really
var value = ThreadLocal.Value
await _next(httpContext);
}
}
Usage
public void Configure(IApplicationBuilder app, ...)
{
app.UseMiddleware<CustomMiddleware>();
}
This technically doesn't answer your question, but you could have a List or hash or Dictionary of registered threads. Whenever your main method(s) are called, at the start do a check to see if that specific thread has been prepared yet.
private var threadFooDict = new Dictionary<int, ThreadSpecificFoo>();
public void Foo(){
var threadId = Thread.CurrentThread.ManagedThreadId;//for the managed thread
//var threadId = AppDomain.GetCurrentThreadId();//for the OS thread
if(!threadFooDict.ContainsKey(threadId))
threadFooDict[threadId] = new ThreadSpecificFoo();
var thisFoo = threadFooDict[threadId];
}
Something like the above could possibly work. If you can't find a way to set up an initialization trigger this should be a decent enough workaround. If you do end up using my solution you should probably replace the dictionary with a concurrent dictionary or something else that's threadSafe.
I created an extension to Enumerable to execute action fastly, so I have listed and in this method, I loop and if object executing the method in certain time out I return,
now I want to make the output generic because the method output will differ, any advice on what to do
this IEnumerable of processes, it's like load balancing, if the first not responded the second should, I want to return the output of the input Action
public static class EnumerableExtensions
{
public static void ForEach<T>(this IEnumerable<T> source, Action action, int timeOut)
{
foreach (T element in source)
{
lock (source)
{
// Loop for all connections and get the fastest responsive proxy
foreach (var mxAccessProxy in source)
{
try
{
// check for the health
Task executionTask = Task.Run(action);
if (executionTask.Wait(timeOut))
{
return ;
}
}
catch
{
//ignore
}
}
}
}
}
}
this code run like
_proxies.ForEach(certainaction, timeOut);
this will enhance the performance and code readability
No, it definitely won't :) Moreover, you bring some more problems with this code like redundant locking or exception swallowing, but don't actually execute code in parallel.
It seems like you want to get the fastest possible call for your Action using some sort of proxy objects. You need to run Tasks asynchronously, not consequently with .Wait().
Something like this could be helpful for you:
public static class TaskExtensions
{
public static TReturn ParallelSelectReturnFastest<TPoolObject, TReturn>(this TPoolObject[] pool,
Func<TPoolObject, CancellationToken, TReturn> func,
int? timeout = null)
{
var ctx = new CancellationTokenSource();
// for every object in pool schedule a task
Task<TReturn>[] tasks = pool
.Select(poolObject =>
{
ctx.Token.ThrowIfCancellationRequested();
return Task.Factory.StartNew(() => func(poolObject, ctx.Token), ctx.Token);
})
.ToArray();
// not sure if Cast is actually needed,
// just to get rid of co-variant array conversion
int firstCompletedIndex = timeout.HasValue
? Task.WaitAny(tasks.Cast<Task>().ToArray(), timeout.Value, ctx.Token)
: Task.WaitAny(tasks.Cast<Task>().ToArray(), ctx.Token);
// we need to cancel token to avoid unnecessary work to be done
ctx.Cancel();
if (firstCompletedIndex == -1) // no objects in pool managed to complete action in time
throw new NotImplementedException(); // custom exception goes here
return tasks[firstCompletedIndex].Result;
}
}
Now, you can use this extension method to call a specific action on any pool of objects and get the first executed result:
var pool = new[] { 1, 2, 3, 4, 5 };
var result = pool.ParallelSelectReturnFastest((x, token) => {
Thread.Sleep(x * 200);
token.ThrowIfCancellationRequested();
Console.WriteLine("calculate");
return x * x;
}, 100);
Console.WriteLine(result);
It outputs:
calculate
1
Because the first task will complete work in 200ms, return it, and all other tasks will be cancelled through cancellation token.
In your case it will be something like:
var actionResponse = proxiesList.ParallelSelectReturnFastest((proxy, token) => {
token.ThrowIfCancellationRequested();
return proxy.SomeAction();
});
Some things to mention:
Make sure that your actions are safe. You can't rely on how many of these will actually come to the actual execution of your action. If this action is CreateItem, then you can end up with many items to be created through different proxies
It cannot guarantee that you will run all of these actions in parallel, because it is up to TPL to chose the optimal number of running tasks
I have implemented in old-fashioned TPL way, because your original question contained it. If possible, you need to switch to async/await - in this case your Func will return tasks and you need to use await Task.WhenAny(tasks) instead of Task.WaitAny()
I am working on improving some of my code to increase efficiency. In the original code I was limiting the number of threads allowed to be 5, and if I had already 5 active threads I would wait until one finished before starting another one. Now I want to modify this code to allow any number of threads, but I want to be able to make sure that only 5 threads get started every second. For example:
Second 0 - 5 new threads
Second 1 - 5 new threads
Second 2 - 5 new threads ...
Original Code (cleanseDictionary contains usually thousands of items):
ConcurrentDictionary<long, APIResponse> cleanseDictionary = new ConcurrentDictionary<long, APIResponse>();
ConcurrentBag<int> itemsinsec = new ConcurrentBag<int>();
ConcurrentDictionary<long, string> resourceDictionary = new ConcurrentDictionary<long, string>();
DateTime start = DateTime.Now;
Parallel.ForEach(resourceDictionary, new ParallelOptions { MaxDegreeOfParallelism = 5 }, row =>
{
lock (itemsinsec)
{
ThrottleAPIRequests(itemsinsec, start);
itemsinsec.Add(1);
}
cleanseDictionary.TryAdd(row.Key, _helper.MakeAPIRequest(string.Format("/endpoint?{0}", row.Value)));
});
private static void ThrottleAPIRequests(ConcurrentBag<int> itemsinsec, DateTime start)
{
if ((start - DateTime.Now).Milliseconds < 10001 && itemsinsec.Count > 4)
{
System.Threading.Thread.Sleep(1000 - (start - DateTime.Now).Milliseconds);
start = DateTime.Now;
itemsinsec = new ConcurrentBag<int>();
}
}
My first thought was increase the MaxDegreeofParallelism to something much higher and then have a helper method that will limit only 5 threads in a second, but I am not sure if that is the best way to do it and if it is, I would probably need a lock around that step?
Thanks in advance!
EDIT
I am actually looking for a way to throttle the API Requests rather than the actual threads. I was thinking they were one in the same.
Edit 2: My requirements are to send over 5 API requests every second
"Parallel.ForEach" from the MS website
may run in parallel
If you want any degree of fine control over how the threads are managed, this is not the way.
How about creating your own helper class where you can queue jobs with a group id, allows you to wait for all jobs of group id X to complete, and it spawns extra threads as and when required?
For me the best solution is:
using System;
using System.Collections.Concurrent;
using System.Threading.Tasks;
namespace SomeNamespace
{
public class RequestLimiter : IRequestLimiter
{
private readonly ConcurrentQueue<DateTime> _requestTimes;
private readonly TimeSpan _timeSpan;
private readonly object _locker = new object();
public RequestLimiter()
{
_timeSpan = TimeSpan.FromSeconds(1);
_requestTimes = new ConcurrentQueue<DateTime>();
}
public TResult Run<TResult>(int requestsOnSecond, Func<TResult> function)
{
WaitUntilRequestCanBeMade(requestsOnSecond).Wait();
return function();
}
private Task WaitUntilRequestCanBeMade(int requestsOnSecond)
{
return Task.Factory.StartNew(() =>
{
while (!TryEnqueueRequest(requestsOnSecond).Result) ;
});
}
private Task SynchronizeQueue()
{
return Task.Factory.StartNew(() =>
{
_requestTimes.TryPeek(out var first);
while (_requestTimes.Count > 0 && (first.Add(_timeSpan) < DateTime.UtcNow))
_requestTimes.TryDequeue(out _);
});
}
private Task<bool> TryEnqueueRequest(int requestsOnSecond)
{
lock (_locker)
{
SynchronizeQueue().Wait();
if (_requestTimes.Count < requestsOnSecond)
{
_requestTimes.Enqueue(DateTime.UtcNow);
return Task.FromResult(true);
}
return Task.FromResult(false);
}
}
}
}
I want to be able to send over 5 API request every second
That's really easy:
while (true) {
await Task.Delay(TimeSpan.FromSeconds(1));
await Task.WhenAll(Enumerable.Range(0, 5).Select(_ => RunRequestAsync()));
}
Maybe not the best approach since there will be a burst of requests. This is not continuous.
Also, there is timing skew. One iteration takes more than 1 second. This can be solved with a few lines of time logic.
I'm hoping to find some advice on the best way to achieve fetching a bunch of id values (like a database Identity values) before I need them. I have a number of classes that require a unique id (int) and what I'd like to do is fetch the next available id (per class, per server) and have it cached locally ready. When an id is taken I want to get the next one ready etc.
I've produced some code to demonstrate what I am trying to do. The code is terrible (it should contain locks etc.) but I think it gets the point across. Losing the odd id is not a problem - a duplicate id is (a problem). I'm happy with the guts of GetNextIdAsync - it calls a proc
this.Database.SqlQuery<int>("EXEC EntityNextIdentityValue #Key",
new SqlParameter("Key", key))).First();
on SQL Server that uses sp_getapplock to ensure each return value is unique (and incremental).
static class ClassId
{
static private Dictionary<string, int> _ids = new Dictionary<string,int>();
static private Dictionary<string, Thread> _threads = new Dictionary<string,Thread>();
static ClassId()
{
//get the first NextId for all known classes
StartGetNextId("Class1");
StartGetNextId("Class2");
StartGetNextId("Class3");
}
static public int NextId(string key)
{
//wait for a current call for nextId to finish
while (_threads.ContainsKey(key)) { }
//get the current nextId
int nextId = _ids[key];
//start the call for the next nextId
StartGetNextId(key);
//return the current nextId
return nextId;
}
static private void StartGetNextId(string key)
{
_threads.Add(key, new Thread(() => GetNextIdAsync(key)));
_threads[key].Start();
}
static private void GetNextIdAsync(string key)
{
//call the long running task to get the next available value
Thread.Sleep(1000);
if (_ids.ContainsKey(key)) _ids[key] += 1;
else _ids.Add(key, 1);
_threads.Remove(key);
}
}
My question is - what is the best way to always have the next value I'm going to need before I need it? How should the class be arranged and where should the locks be? E.g. lock inside GetNextIdAsync() add the new thread but don't start it and change StartGetNextId() to call .Start()?
You should have your database generate the identity values, by marking that column appropriately. You can retrieve that value with SCOPE_IDENTITY or similar.
The main failings of your implementation are the busy wait in NextId and accessing the Dictionary simultaneously from multiple threads. The simplest solution would be to use a BlockingCollection like ohadsc suggests below. You'll need to anticipate the case where your database goes down and you can't get more id's - you don't want to deadlock your application. So you would want to use the Take() overload that accepts a ConcellationToken, which you would notify in the event that accessing the database fails.
This seems like a good application for a producer-consumer pattern.
I'm thinking something like:
private ConcurrentDictionary<string, int> _ids;
private ConcurrentDictionary<string, Thread> _threads;
private Task _producer;
private Task _consumer;
private CancellationTokenSource _cancellation;
private void StartProducer()
{
_producer = Task.Factory.StartNew(() =>
while (_cancellation.Token.IsCancellationRequested == false)
{
_ids.Add(GetNextKeyValuePair());
}
)
}
private void StartConsumer()
{
_consumer = Task.Factory.StartNew(() =>
while (_cancellation.Token.IsCancellationRequested == false)
{
UseNextId(id);
_ids.Remove(id);
}
)
}
A few things to point out...
Firstly, and you probably know this already, it's very important to use thread-safe collections like ConcurrentDictionary or BlockingCollection instead of plain Dictonary or List. If you don't do this, bad things will happen, people will die and babies will cry.
Second, you might need something a little less hamfisted than the basic CancellationTokenSource, that's just what I'm used to from my service programming. The point is to have some way to cancel these things so you can shut them down gracefully.
Thirdly, consider throwing sleeps in there to keep it from pounding the processor too hard.
The particulars of this will vary based on how fast you can generate these things as opposed to how fast you can consume them. My code gives absolutely no guarantee that you will have the ID you want before the consumer asks for it, if the consumer is running at a much higher speed than the producer. However, this is a decent, albeit basic way to organize preparing of this sort of data concurrently.
You could use a BlockingCollection for this. Basically you'll have a thread pumping new IDs into a buffer:
BlockingCollection<int> _queue = new BlockingCollection<int>(BufferSize);
void Init()
{
Task.Factory.StartNew(PopulateIdBuffer, TaskCreationOptions.LongRunning);
}
void PopulateIdBuffer()
{
int id = 0;
while (true)
{
Thread.Sleep(1000); //Simulate long retrieval
_queue.Add(id++);
}
}
void SomeMethodThatNeedsId()
{
var nextId = _queue.Take();
....
}