"Fixed" / "Load Balanced" C# thread pool?

"Fixed" / "Load Balanced" C# thread pool? - c#

I have a 3rd party component that is "expensive" to spin up. This component is not thread safe. Said component is hosted inside of a WCF service (for now), so... every time a call comes into the service I have to new up the component.
What I'd like to do instead is have a pool of say 16 threads that each spin up their own copy of the component and have a mechanism to call the method and have it distributed to one of the 16 threads and have the value returned.
So something simple like:
var response = threadPool.CallMethod(param1, param2);
Its fine for the call to block until it gets a response as I need the response to proceed.
Any suggestions? Maybe I'm overthinking it and a ConcurrentQueue that is serviced by 16 threads would do the job, but now sure how the method return value would get returned to the caller?

WCF will already use the thread pool to manage its resources so if you add a layer of thread management on top of that it is only going to go badly. Avoid doing that if possible as you will get contention on your service calls.
What I would do in your situation is just use a single ThreadLocal or thread static that would get initialized with your expensive object once. Thereafter it would be available to the thread pool thread.
That is assuming that your object is fine on an MTA thread; I'm guessing it is from your post since it sounds like things are current working, but just slow.
There is the concern that too many objects get created and you use too much memory as the pool grows too large. However, see if this is the case in practice before doing anything else. This is a very simple strategy to implement so easy to trial. Only get more complex if you really need to.

First and foremost, I agree with #briantyler: ThreadLocal<T> or thread static fields is probably what you want. You should go with that as a starting point and consider other options if it doesn't meet your needs.
A complicated but flexible alternative is a singleton object pool. In its most simple form your pool type will look like this:
public sealed class ObjectPool<T>
{
private readonly ConcurrentQueue<T> __objects = new ConcurrentQueue<T>();
private readonly Func<T> __factory;
public ObjectPool(Func<T> factory)
{
__factory = factory;
}
public T Get()
{
T obj;
return __objects.TryDequeue(out obj) ? obj : __factory();
}
public void Return(T obj)
{
__objects.Enqueue(obj);
}
}
This doesn't seem awfully useful if you're thinking of type T in terms of primitive classes or structs (i.e. ObjectPool<MyComponent>), as the pool does not have any threading controls built in. But you can substitute your type T for a Lazy<T> or Task<T> monad, and get exactly what you want.
Pool initialisation:
Func<Task<MyComponent>> factory = () => Task.Run(() => new MyComponent());
ObjectPool<Task<MyComponent>> pool = new ObjectPool<Task<MyComponent>>(factory);
// "Pre-warm up" the pool with 16 concurrent tasks.
// This starts the tasks on the thread pool and
// returns immediately without blocking.
for (int i = 0; i < 16; i++) {
pool.Return(pool.Get());
}
Usage:
// Get a pooled task or create a new one. The task may
// have already completed, in which case Result will
// be available immediately. If the task is still
// in flight, accessing its Result will block.
Task<MyComponent> task = pool.Get();
try
{
MyComponent component = task.Result; // Alternatively you can "await task"
// Do something with component.
}
finally
{
pool.Return(task);
}
This method is more complex than maintaining your component in a ThreadLocal or thread static field, but if you need to do something fancy like limiting the number of pooled instances, the pool abstraction can be quite useful.
EDIT
Basic "fixed set of X instances" pool implementation with a Get which blocks once the pool has been drained:
public sealed class ObjectPool<T>
{
private readonly Queue<T> __objects;
public ObjectPool(IEnumerable<T> items)
{
__objects = new Queue<T>(items);
}
public T Get()
{
lock (__objects)
{
while (__objects.Count == 0) {
Monitor.Wait(__objects);
}
return __objects.Dequeue();
}
}
public void Return(T obj)
{
lock (__objects)
{
__objects.Enqueue(obj);
Monitor.Pulse(__objects);
}
}
}

Related

Stop Reentrancy on MemoryCache Calls

The app needs to load data and cache it for a period of time. I would expect that if multiple parts of the app want to access the same cache key at the same time, the cache should be smart enough to only load the data once and return the result of that call to all callers. However, MemoryCache is not doing this. If you hit the cache in parallel (which often happens in the app) it creates a task for each attempt to get the cache value. I thought that this code would achieve the desired result, but it doesn't. I would expect the cache to only run one GetDataAsync task, wait for it to complete, and use the result to get the values for other calls.
using Microsoft.Extensions.Caching.Memory;
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
namespace ConsoleApp4
{
class Program
{
private const string Key = "1";
private static int number = 0;
static async Task Main(string[] args)
{
var memoryCache = new MemoryCache(new MemoryCacheOptions { });
var tasks = new List<Task>();
tasks.Add(memoryCache.GetOrCreateAsync(Key, (cacheEntry) => GetDataAsync()));
tasks.Add(memoryCache.GetOrCreateAsync(Key, (cacheEntry) => GetDataAsync()));
tasks.Add(memoryCache.GetOrCreateAsync(Key, (cacheEntry) => GetDataAsync()));
await Task.WhenAll(tasks);
Console.WriteLine($"The cached value was: {memoryCache.Get(Key)}");
}
public static async Task<int> GetDataAsync()
{
//Simulate getting a large chunk of data from the database
await Task.Delay(3000);
number++;
Console.WriteLine(number);
return number;
}
}
}
That's not what happens. The above displays these results (not necessarily in this order):
2
1
3
The cached value was: 3
It creates a task for each cache request and discards the values returned from the other two.
This needlessly spends time and it makes me wonder if you can say this class is even thread-safe. ConcurrentDictionary has the same behaviour. I tested it and the same thing happens.
Is there a way to achieve the desired behaviour where the task doesn't run 3 times?

MemoryCache leaves it to you to decide how to handle races to populate a cache key. In your case you don't want multiple threads to compete to populate a key presumably because it's expensive to do that.
To coordinate the work of multiple threads like that you need a lock, but using a C# lock statement in asynchronous code can lead to thread pool starvation. Fortunately, SemaphoreSlim provides a way to do async locking so it becomes a matter of creating a guarded memory cache that wraps an underlying IMemoryCache.
My first solution only had a single semaphore for the entire cache putting all cache population tasks in a single line which isn't very smart so instead here is more elaborate solution with a semaphore for each cache key. Another solution could be to have a fixed number of semaphores picked by a hash of the key.
sealed class GuardedMemoryCache : IDisposable
{
readonly IMemoryCache cache;
readonly ConcurrentDictionary<object, SemaphoreSlim> semaphores = new();
public GuardedMemoryCache(IMemoryCache cache) => this.cache = cache;
public async Task<TItem> GetOrCreateAsync<TItem>(object key, Func<ICacheEntry, Task<TItem>> factory)
{
var semaphore = GetSemaphore(key);
await semaphore.WaitAsync();
try
{
return await cache.GetOrCreateAsync(key, factory);
}
finally
{
semaphore.Release();
RemoveSemaphore(key);
}
}
public object Get(object key) => cache.Get(key);
public void Dispose()
{
foreach (var semaphore in semaphores.Values)
semaphore.Release();
}
SemaphoreSlim GetSemaphore(object key) => semaphores.GetOrAdd(key, _ => new SemaphoreSlim(1));
void RemoveSemaphore(object key)
{
if (semaphores.TryRemove(key, out var semaphore))
semaphore.Dispose();
}
}
If multiple threads try to populate the same cache key only a single thread will actually do it. The other threads will instead return the value that was created.
Assuming that you use dependency injection, you can let GuardedMemoryCache implement IMemoryCache by adding a few more methods that forward to the underlying cache to modify the caching behavior throughout your application with very few code changes.

There are different solutions available, the most famous of which is probably LazyCache: it's a great library.
Another one that you may find useful is FusionCache ⚡🦥, which I recently released: it has the exact same feature (although implemented differently) and much more.
The feature you are looking for is described here and you can use it like this:
var result = await fusionCache.GetOrSetAsync(
Key,
_ => await GetDataAsync(),
TimeSpan.FromMinutes(2)
);
You may also find some of the other features interesting, like fail-safe, advanced timeouts with background factory completion and support for an optional, distributed 2nd level.
If you will give it a chance please let me know what you think.
/shameless-plug

How can I add a callback when a new thread is being created in ASP.NET Core?

I've got an ASP.NET Core HTTP-server running in .NET 5. A library that I'm using needs to be initialized once per thread. Ideally, I'd be able to add some kind of callback so that I can call the initialization code when the ASP.NET web server starts a thread. Does such a thing exist?
The reason for this is that I need to make calls into some old code in the OCaml runtime, and OCaml requires each thread to be registered to call into the OCaml runtime. I'm currently doing this once per request, but I want to do this as cheaply as possible instead.
Update: looks like ASP.NET uses the default .NET threadpool. Don't know what to do with this info yet, but if there's a way to run this callback on all threads in the threadpool, that would work for me.

This can be an expensive problem and will not scale well (at all) if the initialization has any sort of resource allocation. However, there are many ways to acheive this, i.e. a concurrent dictionary of thread id, or another novel thread safe solution might be to use ThreadLocal.
Nonsensical Example
This is a contrived example, it's over-baked to only show that it works and is thread safe:
private static readonly ThreadLocal<bool> ThreadLocal = new ThreadLocal<bool>(() =>
{
Thread.Sleep(100);
// dll.init
return true;
});
private static bool Check()
{
if (!ThreadLocal.IsValueCreated)
{
Console.WriteLine("starting thread : " + Thread.CurrentThread.ManagedThreadId);
return ThreadLocal.Value;
}
Console.WriteLine("Already Started : " + Thread.CurrentThread.ManagedThreadId);
return false;
}
Test
for (int i = 0; i < 10; i++)
Task.Run(Check);
Console.ReadKey();
Output
starting thread : 8
starting thread : 4
starting thread : 5
starting thread : 6
starting thread : 7
starting thread : 9
starting thread : 10
starting thread : 11
Already Started : 4
Already Started : 6
Update per comment
Essentially ThreadLocal runs once and only once per thread.
To take this a step further, you could create a per request middleware class and add it to your pipeline:
public class CustomMiddleware
{
private static readonly ThreadLocal<bool> ThreadLocal = new ThreadLocal<bool>(() =>
{
// dll.init
// return anything you like
return true;
});
private readonly RequestDelegate _next;
public CustomMiddleware(RequestDelegate next) => _next = next;
public async Task Invoke(HttpContext httpContext)
{
// use the value if you need, do anything you like really
var value = ThreadLocal.Value
await _next(httpContext);
}
}
Usage
public void Configure(IApplicationBuilder app, ...)
{
app.UseMiddleware<CustomMiddleware>();
}

This technically doesn't answer your question, but you could have a List or hash or Dictionary of registered threads. Whenever your main method(s) are called, at the start do a check to see if that specific thread has been prepared yet.
private var threadFooDict = new Dictionary<int, ThreadSpecificFoo>();
public void Foo(){
var threadId = Thread.CurrentThread.ManagedThreadId;//for the managed thread
//var threadId = AppDomain.GetCurrentThreadId();//for the OS thread
if(!threadFooDict.ContainsKey(threadId))
threadFooDict[threadId] = new ThreadSpecificFoo();
var thisFoo = threadFooDict[threadId];
}
Something like the above could possibly work. If you can't find a way to set up an initialization trigger this should be a decent enough workaround. If you do end up using my solution you should probably replace the dictionary with a concurrent dictionary or something else that's threadSafe.

ASP.NET Web API: One long method, how to ensure it'll be called once

I have a long running request to a web service which should be cached on the server side after completion. My problem is - I don't know how to prevent it being called concurrently/simultaneously before it's cached after first request.
My thought is I should create a data request Task and store it in a concurrent dictionary. So every other request should check if Task is already running and wait for it to complete.
I've ended up with this:
private static ConcurrentDictionary<string, Task> tasksCache = new ConcurrentDictionary<string, Task>();
public static T GetFromCache<T>(this ICacheManager<object> cacheManager, string name, Func<T> func)
{
if (cacheManager.Exists(name))
return (T)cacheManager[name];
if (tasksCache.ContainsKey(name))
{
tasksCache[name].Wait();
return (tasksCache[name] as Task<T>).Result;
}
var runningTask = Task.Run(() => func.Invoke());
tasksCache[name] = runningTask;
runningTask.Wait();
var data = runningTask.Result;
cacheManager.Put(name, data);
tasksCache.TryRemove(name, out Task t);
return data;
}
But this looks messy. Is there a better way?

I'd consider wrapping these in a Lazy<T> for each task, which has built-in semantics for controlling concurrent initialization.
This example demonstrates the use of the Lazy<T> class to provide lazy initialization with access from multiple threads.
You'll want to specify an appropriate LazyThreadSafetyMode.
Fully thread safe; uses locking to ensure that only one thread initializes the value. ExecutionAndPublication

How to ensure data synchronization across threads within a "safe" area (e.g not in a critical section) without locking everything

We are using a proprietary API that requires synchronization of data at some point.
I've thought about some ways of ensuring data consistency but am eager to get more input on better solutions.
Here is a long running Task outlining the API syncing
new Task(() =>
{
while(true)
{
// Here other threads can access any API object (that's fine)
API.CriticalOperationStart(); // Between start and end no API Object may be used
API.CriticalOperationEnd();
// Here other threads can access any API object (that's fine too)
}
}, TaskCreationOptions.LongRunning).Start();
This is a separate task that actually does some data syncing.
The area between Start and End is critical. No other API call may be done while the API is in this critical step.
Here are some non guarded Threads using distinct API Objects:
// multiple calls to different API objects should not be exclusive
OtherThread1
APIObject1.SetData(42);
OtherThread2
APIObject2.SetData(43);
Constraints:
No APIObject Method is allowed to be called during the API is in the critical section.
Both SetData calls are allowed to be done simultaneously. They do not interfere with each other, only with the critical section.
Generally speaking accessing one APIObject from multiple threads is not thread-safe but accessing multiple APIObjects does not interfere with the API except during critical section.
The critical section must never be executed while any APIObject Method is used.
Guarding access to one APIObject from multiple threads is not required.
The trivial approach
Use a lock Object and lock the critical section and every call to API Objects.
This would effectively work but creates many unnecessary locks because of the fact that then also only one APIObject at a time could be accessed too.
Concurrent container of Actions
Use a single instance of a concurrent container where each modification of an APIObject is placed into a thread safe container and is executed in the task above explicitly by traversing the container outside the critical section and calling all actions. (Not a Consumer pattern, as waiting for new entries of the container must not block the task since the critical section must be executed periodically)
This imposes some drawbacks. Closure issues when capturing contexts could be one. Another would be reading from an APIObject returns old data as long as the actions in the container are not executed. Even worse if the creation of an APIObject is put in the container and subsequent code assumes it has already be created.
Make something up with Wait Handles and atomic increments
Every APIObject access could be guarded with a ManualResetEvent. The critical section would wait for the signal to be set by the APIObjects, the signal would only be set when all calls to APIObjects have finished (some sort of atomic increments/decrement around accessing APIObjects).
Sounds like a great way for deadlocks. May lock out the critical section for long periods of time when continuous APIObject calls prevent the signal from being ever set.
Does not solve the problem that APIObjects may not be accessed during critical section since this construct only guards in the other direction.
Requires additional locking (e.g Monitor.IsEntered on the critical section to not lock out simultaneous calls to distinct APIObjects).
=> Awful way, making a complex situation even more complex

If copying an APIObject is relatively inexpensive (or if it's moderately expensive and you don't sync very often) then you can put the objects in a wrapper that contains a singleton global_timestamp and a local_timestamp. When you update an object you first check to see if global_timestamp == long.MaxValue: if true, then return a destructively updated object; if global_timestamp != long.MaxValue and global_timestamp == local_timestamp, then return a destructively updated object. However if global_timestamp != long.MaxValue and global_timestamp != local_timestamp then return an updated copy of the object and set local_timestamp = global_timestamp. When you perform a sync, use an Interlocked update to set global_timestamp = DateTime.UtcNow.ToBinary, and when the sync is complete set global_timestamp = long.MaxValue. This way the rest of the program doesn't have to pause while a sync is performed, and the sync should have consistent data.
// APIObject provided to you
public class APIObject {
private string foo;
public void setFoo(string _foo) {
this.foo = _foo;
}
}
// Global Timestamp, readonly version for wrappers and readwrite version for sync
public class GlobalTimestamp {
protected long timestamp = long.MaxValue;
public long getTimestamp() {
return timestamp;
}
}
public class GlobalTimestampRW extends GlobalTimestamp {
public void startSync(long _timestamp) {
long value = System.Threading.Interlocked.CompareExchange(ref timestamp, _timestamp, long.MaxValue);
if(value != long.MaxValue) throw exception; // somebody else called this method already
}
public void endSync(long _timestamp) {
long value = System.Threading.Interlocked.CompareExchange(ref timestamp, long.MaxValue, _timestamp);
if(value != _timestamp) throw exception; // somebody else called this method already
}
}
// Wrapper
public class APIWrapper {
private APIObject apiObject;
private GlobalTimestamp globalTimestamp;
private long localTimestamp = long.MinValue;
public APIObject setFoo(string _foo) {
long tempGlobalTimestamp = globalTimestamp.getTimestamp();
if(tempGlobalTimestamp == long.MaxValue || tempGlobalTimestamp == localTimestamp) {
apiObject.setFoo(_foo);
return apiObject;
} else {
apiObject = apiObject.copy();
apiObject.setFoo(_foo);
localTimestamp = tempGlobalTimestamp;
return apiObject;
}
}
}
GlobalTimestampRW globalTimestamp;
new Task(() =>
{
while(true)
{
long timestamp = DateTime.UtcNow.ToBinary();
globalTimestamp.startSync(timestamp);
API.CriticalOperationStart(); // Between start and end no API Object may be used
API.CriticalOperationEnd();
globalTimestamp.endSync(timestamp);
}
}, TaskCreationOptions.LongRunning).Start();

C# - Pass data back from ThreadPool thread to main thread

Current implementation: Waits until parallelCount values are collected, uses ThreadPool to process the values, waits until all threads complete, re-collect another set of values and so on...
Code:
private static int parallelCount = 5;
private int taskIndex;
private object[] paramObjects;
// Each ThreadPool thread should access only one item of the array,
// release object when done, to be used by another thread
private object[] reusableObjects = new object[parallelCount];
private void MultiThreadedGenerate(object paramObject)
{
paramObjects[taskIndex] = paramObject;
taskIndex++;
if (taskIndex == parallelCount)
{
MultiThreadedGenerate();
// Reset
taskIndex = 0;
}
}
/*
* Called when 'paramObjects' array gets filled
*/
private void MultiThreadedGenerate()
{
int remainingToGenerate = paramObjects.Count;
resetEvent.Reset();
for (int i = 0; i < paramObjects.Count; i++)
{
ThreadPool.QueueUserWorkItem(delegate(object obj)
{
try
{
int currentIndex = (int) obj;
Generate(currentIndex, paramObjects[currentIndex], reusableObjects[currentIndex]);
}
finally
{
if (Interlocked.Decrement(ref remainingToGenerate) == 0)
{
resetEvent.Set();
}
}
}, i);
}
resetEvent.WaitOne();
}
I've seen significant performance improvements with this approach, however there are a number of issues to consider:
[1] Collecting values in paramObjects and synchronization using resetEvent can be avoided as there is no dependency between the threads (or current set of values with the next set of values). I'm only doing this to manage access to reusableObjects (when a set paramObjects is done processing, I know that all objects in reusableObjects are free, so taskIndex is reset and each new task of the next set of values will have its unique 'reusableObj' to work with).
[2] There is no real connection between the size of reusableObjects and the number of threads the ThreadPool uses. I might initialize reusableObjects to have 10 objects, and say due to some limitations, ThreadPool can run only 3 threads for my MultiThreadedGenerate() method, then I'm wasting memory.
So by getting rid of paramObjects, how can the above code be refined in a way that as soon as one thread completes its job, that thread returns its taskIndex(or the reusableObj) it used and no longer needs so that it becomes available to the next value. Also, the code should create a reUsableObject and add it to some collection only when there is a demand for it. Is using a Queue here a good idea ?
Thank you.

There's really no reason to do your own manual threading and task management any more. You could restructure this to a more loosely-coupled model using Task Parallel Library (and possibly System.Collections.Concurrent for result collation).
Performance could be further improved if you don't need to wait for a full complement of work before handing off each Task for processing.
TPL came along in .Net 4.0 but was back-ported to .Net 3.5. Download here.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.