C# - Pass data back from ThreadPool thread to main thread - c#

Current implementation: Waits until parallelCount values are collected, uses ThreadPool to process the values, waits until all threads complete, re-collect another set of values and so on...
Code:
private static int parallelCount = 5;
private int taskIndex;
private object[] paramObjects;
// Each ThreadPool thread should access only one item of the array,
// release object when done, to be used by another thread
private object[] reusableObjects = new object[parallelCount];
private void MultiThreadedGenerate(object paramObject)
{
paramObjects[taskIndex] = paramObject;
taskIndex++;
if (taskIndex == parallelCount)
{
MultiThreadedGenerate();
// Reset
taskIndex = 0;
}
}
/*
* Called when 'paramObjects' array gets filled
*/
private void MultiThreadedGenerate()
{
int remainingToGenerate = paramObjects.Count;
resetEvent.Reset();
for (int i = 0; i < paramObjects.Count; i++)
{
ThreadPool.QueueUserWorkItem(delegate(object obj)
{
try
{
int currentIndex = (int) obj;
Generate(currentIndex, paramObjects[currentIndex], reusableObjects[currentIndex]);
}
finally
{
if (Interlocked.Decrement(ref remainingToGenerate) == 0)
{
resetEvent.Set();
}
}
}, i);
}
resetEvent.WaitOne();
}
I've seen significant performance improvements with this approach, however there are a number of issues to consider:
[1] Collecting values in paramObjects and synchronization using resetEvent can be avoided as there is no dependency between the threads (or current set of values with the next set of values). I'm only doing this to manage access to reusableObjects (when a set paramObjects is done processing, I know that all objects in reusableObjects are free, so taskIndex is reset and each new task of the next set of values will have its unique 'reusableObj' to work with).
[2] There is no real connection between the size of reusableObjects and the number of threads the ThreadPool uses. I might initialize reusableObjects to have 10 objects, and say due to some limitations, ThreadPool can run only 3 threads for my MultiThreadedGenerate() method, then I'm wasting memory.
So by getting rid of paramObjects, how can the above code be refined in a way that as soon as one thread completes its job, that thread returns its taskIndex(or the reusableObj) it used and no longer needs so that it becomes available to the next value. Also, the code should create a reUsableObject and add it to some collection only when there is a demand for it. Is using a Queue here a good idea ?
Thank you.

There's really no reason to do your own manual threading and task management any more. You could restructure this to a more loosely-coupled model using Task Parallel Library (and possibly System.Collections.Concurrent for result collation).
Performance could be further improved if you don't need to wait for a full complement of work before handing off each Task for processing.
TPL came along in .Net 4.0 but was back-ported to .Net 3.5. Download here.

Related

Batch process all items in ConcurrentBag

I have the following use case. Multiple threads are creating data points which are collected in a ConcurrentBag. Every x ms a single consumer thread looks at the data points that came in since the last time and processes them (e.g. count them + calculate average).
The following code more or less represents the solution that I came up with:
private static ConcurrentBag<long> _bag = new ConcurrentBag<long>();
static void Main()
{
Task.Run(() => Consume());
var producerTasks = Enumerable.Range(0, 8).Select(i => Task.Run(() => Produce()));
Task.WaitAll(producerTasks.ToArray());
}
private static void Produce()
{
for (int i = 0; i < 100000000; i++)
{
_bag.Add(i);
}
}
private static void Consume()
{
while (true)
{
var oldBag = _bag;
_bag = new ConcurrentBag<long>();
var average = oldBag.DefaultIfEmpty().Average();
var count = oldBag.Count;
Console.WriteLine($"Avg = {average}, Count = {count}");
// Wait x ms
}
}
Is a ConcurrentBag the right tool for the job here?
Is switching the bags the right way to achieve clearing the list for new data points and then processing the old ones?
Is it safe to operate on oldBag or could I run into trouble when I iterate over oldBag and a thread is still adding an item?
Should I use Interlocked.Exchange() for switching the variables?
EDIT
I guess the above code was not really a good representation of what I'm trying to achieve. So here is some more code to show the problem:
public class LogCollectorTarget : TargetWithLayout, ILogCollector
{
private readonly List<string> _logMessageBuffer;
public LogCollectorTarget()
{
_logMessageBuffer = new List<string>();
}
protected override void Write(LogEventInfo logEvent)
{
var logMessage = Layout.Render(logEvent);
lock (_logMessageBuffer)
{
_logMessageBuffer.Add(logMessage);
}
}
public string GetBuffer()
{
lock (_logMessageBuffer)
{
var messages = string.Join(Environment.NewLine, _logMessageBuffer);
_logMessageBuffer.Clear();
return messages;
}
}
}
The class' purpose is to collect logs so they can be sent to a server in batches. Every x seconds GetBuffer is called. This should get the current log messages and clear the buffer for new messages. It works with locks but it as they are quite expensive I don't want to lock on every Logging-operation in my program. So that's why I wanted to use a ConcurrentBag as a buffer. But then I still need to switch or clear it when I call GetBuffer without loosing any log messages that happen during the switch.
Since you have a single consumer, you can work your way with a simple ConcurrentQueue, without swapping collections:
public class LogCollectorTarget : TargetWithLayout, ILogCollector
{
private readonly ConcurrentQueue<string> _logMessageBuffer;
public LogCollectorTarget()
{
_logMessageBuffer = new ConcurrentQueue<string>();
}
protected override void Write(LogEventInfo logEvent)
{
var logMessage = Layout.Render(logEvent);
_logMessageBuffer.Enqueue(logMessage);
}
public string GetBuffer()
{
// How many messages should we dequeue?
var count = _logMessageBuffer.Count;
var messages = new StringBuilder();
while (count > 0 && _logMessageBuffer.TryDequeue(out var message))
{
messages.AppendLine(message);
count--;
}
return messages.ToString();
}
}
If memory allocations become an issue, you can instead dequeue them to a fixed-size array and call string.Join on it. This way, you're guaranteed to do only two allocations (whereas the StringBuilder could do many more if the initial buffer isn't properly sized):
public string GetBuffer()
{
// How many messages should we dequeue?
var count = _logMessageBuffer.Count;
var buffer = new string[count];
for (int i = 0; i < count; i++)
{
_logMessageBuffer.TryDequeue(out var message);
buffer[i] = message;
}
return string.Join(Environment.NewLine, buffer);
}
Is a ConcurrentBag the right tool for the job here?
Its the right tool for a job, this really depends on what you are trying to do, and why. The example you have given is very simplistic without any context so its hard to tell.
Is switching the bags the right way to achieve clearing the list for
new data points and then processing the old ones?
The answer is no, for probably many reasons. What happens if a thread writes to it, while you are switching it?
Is it safe to operate on oldBag or could I run into trouble when I
iterate over oldBag and a thread is still adding an item?
No, you have just copied the reference, this will achieve nothing.
Should I use Interlocked.Exchange() for switching the variables?
Interlock methods are great things, however this will not help you in your current problem, they are for thread safe access to integer type values. You are really confused and you need to look up more thread safe examples.
However Lets point you in the right direction. forget about ConcurrentBag and those fancy classes. My advice is start simple and use locking so you understand the nature of the problem.
If you want multiple tasks/threads to access a list, you can easily use the lock statement and guard access to the list/array so other nasty threads aren't modifying it.
Obviously the code you have written is a nonsensical example, i mean you are just adding consecutive numbers to a list, and getting another thread to average them them. This hardly needs to be consumer producer at all, and would make more sense to just be synchronous.
At this point i would point you to better architectures that would allow you to implement this pattern, e.g Tpl Dataflow, but i fear this is just a learning excise and unfortunately you really need to do more reading on multithreading and try more examples before we can truly help you with a problem.
It works with locks but it as they are quite expensive. I don't want to lock on every logging-operation in my program.
Acquiring an uncontended lock is actually quite cheap. Quoting from Joseph Albahari's book:
You can expect to acquire and release a lock in as little as 20 nanoseconds on a 2010-era computer if the lock is uncontended.
Locking becomes expensive when it is contended. You can minimize the contention by reducing the work inside the critical region to the absolute minimum. In other words don't do anything inside the lock that can be done outside the lock. In your second example the method GetBuffer does a String.Join inside the lock, delaying the release of the lock and increasing the chances of blocking other threads. You can improve it like this:
public string GetBuffer()
{
string[] messages;
lock (_logMessageBuffer)
{
messages = _logMessageBuffer.ToArray();
_logMessageBuffer.Clear();
}
return String.Join(Environment.NewLine, messages);
}
But it can be optimized even further. You could use the technique of your first example, and instead of clearing the existing List<string>, just swap it with a new list:
public string GetBuffer()
{
List<string> oldList;
lock (_logMessageBuffer)
{
oldList = _logMessageBuffer;
_logMessageBuffer = new();
}
return String.Join(Environment.NewLine, oldList);
}
Starting from .NET Core 3.0, the Monitor class has the property Monitor.LockContentionCount, that returns the number of times there was contention at the entry point of a lock. You could watch the delta of this property every second, and see if the number is concerning. If you get single-digit numbers, there is nothing to worry about.
Touching some of your questions:
Is a ConcurrentBag the right tool for the job here?
No. The ConcurrentBag<T> is a very specialized collection intended for mixed producer scenarios, mainly object pools. You don't have such a scenario here. A ConcurrentQueue<T> is preferable to a ConcurrentBag<T> in almost all scenarios.
Should I use Interlocked.Exchange() for switching the variables?
Only if the collection was immutable. If the _logMessageBuffer was an ImmutableQueue<T>, then it would be excellent to swap it with Interlocked.Exchange. With mutable types you have no idea if the old collection is still in use by another thread, and for how long. The operating system can suspend any thread at any time for a duration of 10-30 milliseconds or even more (demo). So it's not safe to use lock-free techniques. You have to lock.

Threads cause GUI to freeze up

So I'm not the most experienced with the C# programming language, however I've been making a few test applications here and there.
I've noticed that the more threads I create for an application I'm working on, the more my GUI starts to freeze. I'm not sure why this occurs, I previously thought that part of the point of multi-threading an application was to avoid the GUI from freezing up.
An explanation would be appreciated.
Also, here's the code I use to create the threads:
private void runThreads(int amount, ThreadStart address)
{
for (int i = 0; i < amount; i++)
{
threadAmount += 1;
Thread currentThread = new Thread(address);
currentThread.Start();
}
}
and here's what the threads run:
private void checkProxies()
{
while (started)
{
try
{
WebRequest request = WebRequest.Create("http://google.co.nz/");
request.Timeout = (int)timeoutCounter.Value * 1000;
request.Proxy = new WebProxy(proxies[proxyIndex]);
Thread.SetData(Thread.GetNamedDataSlot("currentProxy"), proxies[proxyIndex]);
if (proxyIndex != proxies.Length)
{
proxyIndex += 1;
}
else
{
started = false;
}
request.GetResponse();
workingProxies += 1;
}
catch (WebException)
{
deadProxies += 1;
}
lock ("threadAmount")
{
if (threadAmount > proxies.Length - proxyIndex)
{
threadAmount -= 1;
break;
}
}
}
}
While I cannot tell you why exactly your code is slowing the GUI down, there are a few things in your code you should do to make it all-round better. If the problem persist then, then it should be a lot easier to pinpoint the issue.
Creating Thread objects is expensive. This is why new classes were added in C# to better handle multi-threading. Now you have access to the Task class or the Parallel class (described below).
Judging from the comments, you're running a LOT of threads at the same time. While it shouldn't be an issue to just run them, you're not really getting much use out of them if what you're doing is firing WebRequests (unless you have an awesome network). Use multiple threads, sure, but limit their number.
A Task is great when you want to do a specific operation in the background. But when you want to repeat a single operation for a specific set of data in the background... why not use the System.Threading.Tasks.Parallel class? Specifically, Parallel.ForEach (where you can specify your list of proxies as the parameter). This method also lets you set up how many threads are supposed to run concurrently at any given moment by using ParallelOptions.
One more way to code would be to make use of the async and await keywords available in .NET 4.5. In this case your GUI (button press?) should invoke an async method.
Use thread-safe methods like Interlocked.Increment or Interlocked.Add to increase / decrease counters available from multiple threads. Also, consider that you could change your list of proxies to a ConcurrentDictionary<string, bool> (where bool indicates if the proxy works or not) and set the values without any worry as each thread will only access its own entry in the dictionary. You can easily queue the totals at the end using LINQ: dictionary.Where(q => q.Value).Count() to get the number of working proxies, for example. Of course other classes are also available, depending on how you want to tackle the issue - perhaps a Queue (or ConcurrentQueue)?
Your lock shouldn't really work... as in, it seems it's working by accident than by design in your code (thanks Luaan for the comment). But you really shouldn't do that. Consult the MSDN documentation on lock to get a better understanding of how it works. The Object created in the MSDN example isn't just for show.
You can also make the requests themselves multi-threaded by using BeginGetResponse and EndGetResponse methods. You can, in fact, combine this with the Task class for a much cleaner code (the Task class can convert Begin/End method pairs into a single Task object).
So, a quick recap - use the Parallel class for multi-threading, and use concurrency classes for keeping things in place.
Here's a quick example I wrote:
private ConcurrentDictionary<string, bool?> values = new ConcurrentDictionary<string, bool?>();
private async void Button_Click(object sender, RoutedEventArgs e)
{
var result = await CheckProxies();
label.Content = result.ToString();
}
async Task<int> CheckProxies()
{
//I don't actually HAVE a list of proxies, so I make up some data
for (int i = 0; i < 1000; i++)
values[Guid.NewGuid().ToString()] = null;
await Task.Factory.StartNew(() => Parallel.ForEach(values, new ParallelOptions() { MaxDegreeOfParallelism = 10 }, this.PeformOperation));
//note that with maxDegreeOfParallelism set to a high value (like 1000)
//then I'll get a TON of failed requests simply because I'm overloading the network
//either that or google thinks I'm DDOSing them... >_<
return values.Where(v => v.Value == true).Count();
}
void PeformOperation(KeyValuePair<string, bool?> kvp)
{
try
{
WebRequest request = WebRequest.Create("http://google.co.nz/");
request.Timeout = 100;
//I'm not actually setting up the proxy from kvp,
//because it's populated with bogus data
request.GetResponse();
values[kvp.Key] = true;
}
catch (WebException)
{
values[kvp.Key] = false;
}
}
Despite the other comments being correct in that you should use either the Task class, or better yet, the async API, that is not the reason you're locking up.
The line of code that is causing your threads to lock up is this:
request.Timeout = (int)timeoutCounter.Value * 1000;
I am assuming that timeoutCounter is a control on the WinForm - which is running on the main GUI thread.
In other words, your threads' code is trying to access a control which is not in it's own thread, which is not really "allowed", at least not so simply.
For example, this question shows how to do this, albeit most of the answers there are a bit dated.
From a quick google (okay who am I kidding, I binged it) I found this article that explains the problem rather well.

How to ensure that part of code (which contains async) would be called only once?

So, I have a method, which contains async call to the server.
That code is called from 3rd party tool, which somehow sometimes calls same method several times in a row from different threads, so I can't affect that.
What I want to be sure is that my method is called once, and then, another calls should be ignored.
At first, I tried to lock(locker) with bool isBusy, but that is not satisfied me, as async request was still called several times from second thread, which was fast enough to see isBusy=true;
Then, I tried Monitor
object obj = new object();
Monitor.TryEnter(obj);
try
{
var res = await _dataService.RequestServerAsync(SelectedIndex, e.StartIndex, e.Count);
****
}
finally
{
Monitor.Exit(obj);
}
However, on Exit(), I'm getting exception:
A first chance exception of type
'System.Threading.SynchronizationLockException'
Is there any other way to guarantee only 1 time execution of the code?
Put in the class:
private int entered = 0;
and in the method:
if (Interlocked.Increment(ref entered) != 1)
{
return;
}
Only the first call to the method will be able to change entered from 0 to 1. The others will make it 2, 3, 4, 5...
Clearly you'll need something to reset the entered if you want your method to be refireable...
Interlocked.Exchange(ref entered, 0);
at the end of a successful call to the method.
Ah... and it isn't possible to use lock/Monitor.* around an await, because the current thread of the method can change, while nearly all the synchronization libraries expect that the thread you use to enter a lock is the same you use to exit the lock.
You can even use the Interlocked.CompareExchange()...
if (Interlocked.CompareExchange(ref entered, 1, 0) != 0)
{
return;
}
The first thread to enter will be able to exchange the value of entered from 0 to 1, and it will receive the old value, 0 (so failing the if and continuing in the remaining code). The other threads will fail the CompareExchange and see the "current" value of 1, so entering the if and exiting the method
If you do want to restrict multiple threads using the same method concurrently then I would use the Semaphore class to facilitate the required thread limit; here's how...
A semaphore is like a mean night club bouncer, it has been provide a club capacity and is not allowed to exceed this limit. Once the club is full, no one else can enter... A queue builds up outside. Then as one person leaves another can enter (analogy thanks to J. Albahari).
A Semaphore with a value of one is equivalent to a Mutex or Lock except that the Semaphore has no owner so that it is thread ignorant. Any thread can call Release on a Semaphore whereas with a Mutex/Lock only the thread that obtained the Mutex/Lock can release it.
Now, for your case we are able to use Semaphores to limit concurrency and prevent too many threads from executing a particular piece of code at once. In the following example five threads try to enter a night club that only allows entry to three...
class BadAssClub
{
static SemaphoreSlim sem = new SemaphoreSlim(3);
static void Main()
{
for (int i = 1; i <= 5; i++)
new Thread(Enter).Start(i);
}
// Enfore only three threads running this method at once.
static void Enter(int i)
{
try
{
Console.WriteLine(i + " wants to enter.");
sem.Wait();
Console.WriteLine(i + " is in!");
Thread.Sleep(1000 * (int)i);
Console.WriteLine(i + " is leaving...");
}
finally
{
sem.Release();
}
}
}
Note, that SemaphoreSlim is a lighter weight version of the Semaphore class and incurs about a quarter of the overhead. it is sufficient for what you require.
I hope this helps.

how to restrict a method to n concurrent calls

I have an integration service which runs a calculation heavy, data bound process. I want to make sure that there are never more than say, n = 5, (but n will be configurable, changeable at runtime) of these processes running at the same. The idea is to throttle the load on the server to a safe level. The amount of data processed by the method is limited by batching, so I don't need to worry about 1 process representing a much bigger load than another.
The processing method is called by another process, where requests to run payroll are held on a queue, and I can insert some logic at that point to determine whether to process this request now, or leave it on the queue.
So i want a seperate method on the same service as the processing method, which can tell me if the server can accept another call to the processing method. It's going to ask, "how many payroll runs are going on? is that less than n?" What's a neat way of achieving this?
-----------edit------------
I think I need to make it clear, the process that decides whether to take the request off the queue this is seperated from the service that processes the payroll data by a WCF boundary. Stopping a thread on the payroll processing process isn't going to prevent more requests coming in
You can use a Semaphore to do this.
public class Foo
{
private Semaphore semaphore;
public Foo(int numConcurrentCalls)
{
semaphore = new Semaphore(numConcurrentCalls, numConcurrentCalls);
}
public bool isReady()
{
return semaphore.WaitOne(0);
}
public void Bar()
{
try
{
semaphore.WaitOne();//it will only get past this line if there are less than
//"numConcurrentCalls" threads in this method currently.
//do stuff
}
finally
{
semaphore.Release();
}
}
}
Review the Object Pool pattern. This is what you're describing. While not strictly required by the pattern, you can expose the number of objects currently in the pool, the maximum (configured) number, the high-watermark, etc.
I think that you might want a BlockingCollection, where each item in the collection represents one of the concurrent calls.
Also see IProducerConsumerCollection.
If you were just using threads, I'd suggest you look at the methods for limiting thread concurrency (e.g. the TaskScheduler.MaximumConcurrencyLevel property, and this example.).
Also see ParallelEnumerable.WithDegreeOfParallelism
void ThreadTest()
{
ConcurrentQueue<int> q = new ConcurrentQueue<int>();
int MaxCount = 5;
Random r = new Random();
for (int i = 0; i <= 10000; i++)
{
q.Enqueue(r.Next(100000, 200000));
}
ThreadStart proc = null;
proc = () =>
{
int read = 0;
if (q.TryDequeue(out read))
{
Console.WriteLine(String.Format("[{1:HH:mm:ss}.{1:fff}] starting: {0}... #Thread {2}", read, DateTime.Now, Thread.CurrentThread.ManagedThreadId));
Thread.Sleep(r.Next(100, 1000));
Console.WriteLine(String.Format("[{1:HH:mm:ss}.{1:fff}] {0} ended! #Thread {2}", read, DateTime.Now, Thread.CurrentThread.ManagedThreadId));
proc();
}
};
for (int i = 0; i <= MaxCount; i++)
{
new Thread(proc).Start();
}
}

Multi-Threading - waiting for all threads to be signalled

I have scenarios where I need a main thread to wait until every one of a set of possible more than 64 threads have completed their work, and for that I wrote the following helper utility, (to avoid the 64 waithandle limit on WaitHandle.WaitAll())
public static void WaitAll(WaitHandle[] handles)
{
if (handles == null)
throw new ArgumentNullException("handles",
"WaitHandle[] handles was null");
foreach (WaitHandle wh in handles) wh.WaitOne();
}
With this utility method, however, each waithandle is only examined after every preceding one in the array has been signalled... so it is in effect synchronous, and will not work if the waithandles are autoResetEvent wait handles (which clear as soon as a waiting thread has been released)
To fix this issue I am considering changing this code to the following, but would like others to check and see if it looks like it will work, or if anyone sees any issues with it, or can suggest a better way ...
Thanks in advance:
public static void WaitAllParallel(WaitHandle[] handles)
{
if (handles == null)
throw new ArgumentNullException("handles",
"WaitHandle[] handles was null");
int actThreadCount = handles.Length;
object locker = new object();
foreach (WaitHandle wh in handles)
{
WaitHandle qwH = wh;
ThreadPool.QueueUserWorkItem(
delegate
{
try { qwH.WaitOne(); }
finally { lock(locker) --actThreadCount; }
});
}
while (actThreadCount > 0) Thread.Sleep(80);
}
If you know how many threads you have, you can use an interlocked decrement. This is how I usually do it:
{
eventDone = new AutoResetEvent();
totalCount = 128;
for(0...128) {ThreadPool.QueueUserWorkItem(ThreadWorker, ...);}
}
void ThreadWorker(object state)
try
{
... work and more work
}
finally
{
int runningCount = Interlocked.Decrement(ref totalCount);
if (0 == runningCount)
{
// This is the last thread, notify the waiters
eventDone.Set();
}
}
Actually, most times I don't even signal but instead invoke a callback continues the processing from where the waiter would continue. Less blocked threads, more scalability.
I know is different and may not apply to your case (eg. for sure will not work if some of thoe handles are not threads, but I/O or events), but it may worth thinking about this.
I'm not sure what exactly you're trying to do, but would a CountdownEvent (.NET 4.0) conceptually solve your problem?
I'm not a C# or .NET programmer, but you could use a semaphore that is posted when one of your worker threads exits. The monitoring thread would simply wait on the semaphore n times where n is the number of worker threads. Semaphores are traditionally used to count resources in use but they can be used to count jobs completed by waiting on the same semaphore for n times.
When working with lots of simultaneous threads, I prefer to add each thread's ManagedThreadId into a Dictionary when I start the thread, and then have each thread invoke a callback routine that removes the dying thread's id from the Dictionary. The Dictionary's Count property tells you how many threads are active. Use the value side of the key/value pair to hold info that your UI thread can use to report status. Wrap the Dictionary with a lock to keep things safe.
ThreadPool.QueueUserWorkItem(o =>
{
try
{
using (var h = (o as WaitHandle))
{
if (!h.WaitOne(100000))
{
// Alert main thread of the timeout
}
}
}
finally
{
Interlocked.Decrement(ref actThreadCount);
}
}, wh);

Categories