Replace infinite thread loop (message pump) with Tasks

Replace infinite thread loop (message pump) with Tasks - c#

In my application I have to listen on multiple different queues and deserialize/dispatch incoming messages received on queues.
Actually, what I am doing to achieve this is that each QueueConnector object creates a new thread on construction, which executes a infinite loop with a blocking call to queue.Receive() to receive next message in queue as exposed by the code below :
// Instantiate message pump thread
msmqPumpThread = new Thread(() => while (true)
{
// Blocking call (infinite timeout)
// Wait for a new message to come in queue and get it
var message = queue.Receive();
// Deserialize/Dispatch message
DeserializeAndDispatchMessage(message);
}).Start();
I'd like to know if this "message pump" can be replaced using Task(s) instead of going through an infinite loop on a new Thread.
I made a task already for the Message receiving part (see below) but I don't really see how to use it for a message pump (Can I recall the same task on completion over and over again, with continuations, replacing infinite loop in separate thread as in the code above ?)
Task<Message> GetMessageFromQueueAsync()
{
var tcs = new TaskCompletionSource<Message>();
ReceiveCompletedEventHandler receiveCompletedHandler = null;
receiveCompletedHandler = (s, e) =>
{
queue.ReceiveCompleted -= receiveCompletedHandler;
tcs.SetResult(e.Message);
};
queue.BeginReceive();
return tcs.Task;
}
Will I gain anything by using Tasks instead of an infinite loop in a separate thread (with a blocking call => blocking thread) in this context ? And if yes, how to do it properly ?
Please note that this application don't have a lot of QueueConnector objects, and won't have (maybe 10 connectors MAX), meaning ten Threads max through the first solution, so memory footprint / performance starting threads is not an issue here. I was rather thinking about scheduling performance / CPU usage. Will there be any difference ?

You will generally have more overhead and less throughput with async code when the count of threads is low. Nonblocking code is most useful when the number of threads is very high causing a) lots of wasted memory due to stacks and b) context switches. It has noticable overhead though because of more allocation, more indirection and more user-kernel-transitions.
For low thread counts (< 100) you probably shouldn't worry. Try to focus on writing maintainable, bug-resistant and simple code. Use threads.

Related

Will a large number of Task.Delay cause performance problems

Will a large number of Task.Delay cause performance problems, or is there a better way to replace it when I want to use it to delay delivery of messages to rabbitmq?
I recently wrote an eventbus combined with orleans. When the consumer's consumption is abnormal, I want it to try again several times within 5 minutes to fix the error caused by the short-term system unavailability. I want to use await Task. Delay to implement it, I am not sure if this will affect performance or if there is a better way to implement my idea.
Thanks.

A large number of anything will cause performance problems, however an awaited Task.Delay is one of the better approaches. It's lightweight, doesn't block a thread, and works on fairly lightweight plumbing. Its implementation is as follows:
public static Task Delay(int millisecondsDelay, CancellationToken cancellationToken)
{
//error checking
Task.DelayPromise delayPromise = new Task.DelayPromise(cancellationToken);
if (cancellationToken.CanBeCanceled)
delayPromise.Registration = cancellationToken.InternalRegisterWithoutEC((Action<object>) (state => ((Task.DelayPromise) state).Complete()), (object) delayPromise);
if (millisecondsDelay != -1)
{
delayPromise.Timer = new Timer((TimerCallback) (state => ((Task.DelayPromise) state).Complete()), (object) delayPromise, millisecondsDelay, -1);
delayPromise.Timer.KeepRootedWhileScheduled();
}
return (Task) delayPromise;
}
The Timer just wraps the Win32 timer queue, which is a delta-queue that fires events on the thread pool:
Timer Queues
The CreateTimerQueue function creates a queue for timers. Timers in
this queue, known as timer-queue timers, are lightweight objects that
enable you to specify a callback function to be called when the
specified due time arrives.

Waiting on a continuous UI background polling task

I am somewhat new to parallel programming C# (When I started my project I worked through the MSDN examples for TPL) and would appreciate some input on the following example code.
It is one of several background worker tasks. This specific task pushes status messages to a log.
var uiCts = new CancellationTokenSource();
var globalMsgQueue = new ConcurrentQueue<string>();
var backgroundUiTask = new Task(
() =>
{
while (!uiCts.IsCancellationRequested)
{
while (globalMsgQueue.Count > 0)
ConsumeMsgQueue();
Thread.Sleep(backgroundUiTimeOut);
}
},
uiCts.Token);
// Somewhere else entirely
backgroundUiTask.Start();
Task.WaitAll(backgroundUiTask);
I'm asking for professional input after reading several topics like Alternatives to using Thread.Sleep for waiting, Is it always bad to use Thread.Sleep()?, When to use Task.Delay, when to use Thread.Sleep?, Continuous polling using Tasks
Which prompts me to use Task.Delay instead of Thread.Sleep as a first step and introduce TaskCreationOptions.LongRunning.
But I wonder what other caveats I might be missing? Is polling the MsgQueue.Count a code smell? Would a better version rely on an event instead?

First of all, there's no reason to use Task.Start or use the Task constructor. Tasks aren't threads, they don't run themselves. They are a promise that something will complete in the future and may or may not produce any results. Some of them will run on a threadpool thread. Use Task.Run to create and run the task in a single step when you need to.
I assume the actual problem is how to create a buffered background worker. .NET already offers classes that can do this.
ActionBlock< T >
The ActionBlock class already implements this and a lot more - it allows you to specify how big the input buffer is, how many tasks will process incoming messages concurrently, supports cancellation and asynchronous completion.
A logging block could be as simple as this :
_logBlock=new ActionBlock<string>(msg=>File.AppendAllText("myLog.txt",msg));
The ActionBlock class itself takes care of buffering the inputs, feeding new messages to the worker function when it arrives, potentially blocking senders if the buffer gets full etc. There's no need for polling.
Other code can use Post or SendAsync to send messages to the block :
_block.Post("some message");
When we are done, we can tell the block to Complete() and await for it to process any remaining messages :
_block.Complete();
await _block.Completion;
Channels
A newer, lower-level option is to use Channels. You can think of channels as a kind of asynchronous queue, although they can be used to implement complex processing pipelines. If ActionBlock was written today, it would use Channels internally.
With channels, you need to provide the "worker" task yourself. There's no need for polling though, as the ChannelReader class allows you to read messages asynchronously or even use await foreach.
The writer method could look like this :
public ChannelWriter<string> LogIt(string path,CancellationToken token=default)
{
var channel=Channel.CreateUnbounded<string>();
var writer=channel.Writer;
_=Task.Run(async ()=>{
await foreach(var msg in channel.Reader.ReadAllAsync(token))
{
File.AppendAllText(path,msg);
}
},token).ContinueWith(t=>writer.TryComplete(t.Exception);
return writer;
}
....
_logWriter=LogIt(somePath);
Other code can send messages by using WriteAsync or TryWrite, eg :
_logWriter.TryWrite(someMessage);
When we're done, we can call Complete() or TryComplete() on the writer :
_logWriter.TryComplete();
The line
.ContinueWith(t=>writer.TryComplete(t.Exception);
is needed to ensure the channel is closed even if an exception occurs or the cancellation token is signaled.
This may seem too cumbersome at first. Channels allow us to easily run initialization code or carry state from one message to the next. We could open a stream before the loop starts and use it instead of reopening the file each time we call File.AppendAllText, eg :
public ChannelWriter<string> LogIt(string path,CancellationToken token=default)
{
var channel=Channel.CreateUnbounded<string>();
var writer=channel.Writer;
_=Task.Run(async ()=>{
//***** Can't do this with an ActionBlock ****
using(var writer=File.AppendText(somePath))
{
await foreach(var msg in channel.Reader.ReadAllAsync(token))
{
writer.WriteLine(msg);
//Or
//await writer.WriteLineAsync(msg);
}
}
},token).ContinueWith(t=>writer.TryComplete(t.Exception);
return writer;
}

Definitely Task.Delay is better than Thread.Sleep, because you will not be blocking the thread on the pool, and during the wait the thread on the pool will be available to handle other tasks. Then, you don't need to make your task long-running. Long-running tasks are run in a dedicated thread, and then Task.Delay is meaningless.
Instead, I will recommend a different approach. Just use System.Threading.Timer and make your life simple. Timers are kernel objects that will run their callback on the thread pool, and you will not have to worry about delay or sleep.

The TPL Dataflow library is the preferred tool for this kind of job. It allows building efficient producer-consumer pairs quite easily, and more complex pipelines as well, while offering a complete set of configuration options. In your case using a single ActionBlock should be enough.
A simpler solution you might consider is to use a BlockingCollection. It has the advantage of not requiring the installation of any package (because it is built-in), and it's also much easier to learn. You don't have to learn more than the methods Add, CompleteAdding, and GetConsumingEnumerable. It also supports cancellation. The drawback is that it's a blocking collection, so it blocks the consumer thread while waiting for new messages to arrive, and the producer thread while waiting for available space in the internal buffer (only if you specify a boundedCapacity in the constructor).
var uiCts = new CancellationTokenSource();
var globalMsgQueue = new BlockingCollection<string>();
var backgroundUiTask = new Task(() =>
{
foreach (var item in globalMsgQueue.GetConsumingEnumerable(uiCts.Token))
{
ConsumeMsgQueueItem(item);
}
}, uiCts.Token);
The BlockingCollection uses a ConcurrentQueue internally as a buffer.

EventHub ForEach Parallel Async

Always managing to confuse myself working with async, I'm after a bit of validation/confirmation here that i'm doing what i think i'm doing in the following scenarios..
given the following trivial example:
// pretend / assume these are json msgs or something ;)
var strEvents = new List<string> { "event1", "event2", "event3" };
i can post each event to an eventhub simply as follows:
foreach (var e in strEvents)
{
// Do some things
outEventHub.Add(e); // ICollector
}
the foreach will run on a single thread, and execute each thing inside sequentially.. the posting to eventhub will also remain on the same thread too i guess??
Changing ICollector to IAsyncCollector, and achieve the following:
foreach (var e in strEvents)
{
// Do some things
await outEventHub.AddAsync(e);
}
I think i am right here in saying that the foreach will run on a single thread, the actual sending to the event hub will be pushed off elsewhere? Or at least not block that same thread..
Changing to Parallel.ForEach event as these events will be arriving 100+ or so at a time:
Parallel.ForEach(events, async (e) =>
{
// Do some things
await outEventHub.AddAsync(e);
});
Starting to get a bit hazy now, as i am not sure what really is going on now... afaik the each event has it's own thread (within the bounds of the hardware) and steps within that thread do not block it.. so this trivial example aside.
Finally, i could turn them all in to Tasks i thought..
private static async Task DoThingAsync(string e, IAsyncCollector<string> outEventHub)
{
await outEventHub.AddAsync(e);
}
var t = new List<Task>();
foreach (var e in strEvents)
{
t.Add(DoThingAsync(e, outEventHub));
}
await Task.WhenAll(t);
now i am really hazy, and i think this is prepping everything on a single thread.. and then running everything exactly at the same time, on any thread available??
I appreciate that in order to determine which is right for the job at hand benchmarking is required... but an explanation of what the framework is doing in each situation would be super helpful for me right now..

Parallel != async
This is the main idea here. Both of them have their uses, and they can be used together, but they are very different. You are mostly right with your assumptions, but let me clarify:
Simple foreach
This is non-parallel and non-async. Nothing to talk about.
Await inside foreach
This is async code that is non-parallel.
foreach (var e in strEvents)
{
// Do some things
await outEventHub.AddAsync(e);
}
This will all take place on a single thread. It takes an event, starts adding it to your event hub, and while it is being completed (I'm guessing it does some sort of network IO) it hands back the thread to the thread pool (or UI if it was called on a UI thread) so it can do other work while wating on AddAsync to return. But as you said, is is not parallel at all.
Parallel Foreach (async)
This one is a trap! In short, Parallel.Foreach is designed for synchronous workloads. We'll get back to this but first let's assume you used it with the non-async code.
Parallel foreach (sync)
A.k.a. Parallel but not async.
Parallel.ForEach(events, (e) =>
{
// Do some things
outEventHub.Add(e);
});
Each item will get its own "Task", but they won't spawn a thread. Creating threads is expensive, and in an optimal case there is no point in having more threads than CPU cores. Instead these tasks run on a ThreadPool, which has just as many Threads as optimal. Each thread takes a task, works on it, then takes another one, etc.
You can think of it as - on a 4 core machine - having 4 workers around a pile of tasks, so 4 of them are being run at a time. You can imagine that this is not ideal in case of IO bound workloads (which this most likely is). If your network is slow, you can have all 4 threads blocked on trying to send the event out, while they could be doing useful work. This leads us to...
Tasks
Async and potentially parallel (depends on the usage).
Your description is correct here, too, except for the ThreadPool, it is kikking off all the tasks at once (on the main thread), which then run on the pool's threads. While they are running, the main thread is released, which then can do other work, as needed. Up to this point it is the same as the Parallel.Foreach case. But:
What happens is that a TaskPool thread picks up a task, does the necessary preprocessing, then sends out the network request asynchronously. This means that this task will not block while waiting for the network, but rather it releases the ThreadPool thread to pick up another workitem. When the network request completes, the tasks continuation (the remaining code lines after the network request) is scheduled back to the list of tasks.
You can see that theoretically this is the most efficient process, so fast that you have to be careful not to flood your network.
Back to Parallel.Foreach and async
At this point you should be able to spot the problem. All your async lambda async (e) => { await outEventHub.AddAsync(e);} is doing is to kick off the work, it will return right after it hits the await. (Remember that async/await is releasing threads while waiting.) Parallel.Foreach returns right after it started all of them. But nothing is awaiting these tasks! These become fire and forget, which is usually a bad practice. It is like you deleted the await Task.WhenAll call from your task example.
I hope this cleared most things for you, if not, let me know what to improve on.

Why don't you send those events asynchronously in parallel, like this:
var tasks = new List<Task>();
foreach( var e in strEvents )
{
tasks.Add(outEventHub.AddAsync(e));
}
await Task.WhenAll(tasks);
await outEventHub.FlushAsync();

Limiting the number of threadpool threads

I am using ThreadPool in my application. I have first set the limit of the thread pool by using the following:
ThreadPool.SetMaxThreads(m_iThreadPoolLimit,m_iThreadPoolLimit);
m_Events = new ManualResetEvent(false);
and then I have queued up the jobs using the following
WaitCallback objWcb = new WaitCallback(abc);
ThreadPool.QueueUserWorkItem(objWcb, m_objThreadData);
Here abc is the name of the function that I am calling.
After this I am doing the following so that all my threads come to 1 point and the main thread takes over and continues further
m_Events.WaitOne();
My thread limit is 3. The problem that I am facing is, inspite of the thread pool limit set to 3, my application is processing more than 3 files at the same time, whereas it was supposed to process only 3 files at a time. Please help me solve this issue.

What kind of computer are you using?
From MSDN
You cannot set the number of worker
threads or the number of I/O
completion threads to a number smaller
than the number of processors in the
computer.
If you have 4 cores, then the smallest you can have is 4.
Also note:
If the common language runtime is
hosted, for example by Internet
Information Services (IIS) or SQL
Server, the host can limit or prevent
changes to the thread pool size.
If this is a web site hosted by IIS then you cannot change the thread pool size either.

A better solution involves the use of a Semaphore which can throttle the concurrent access to a resource1. In your case the resource would simply be a block of code that processes work items.
var finished = new CountdownEvent(1); // Used to wait for the completion of all work items.
var throttle = new Semaphore(3, 3); // Used to throttle the processing of work items.
foreach (WorkItem item in workitems)
{
finished.AddCount();
WorkItem capture = item; // Needed to safely capture the loop variable.
ThreadPool.QueueUserWorkItem(
(state) =>
{
throttle.WaitOne();
try
{
ProcessWorkItem(capture);
}
finally
{
throttle.Release();
finished.Signal();
}
}, null);
}
finished.Signal();
finished.Wait();
In the code above WorkItem is a hypothetical class that encapsulates the specific parameters needed to process your tasks.
The Task Parallel Library makes this pattern a lot easier. Just use the Parallel.ForEach method and specify a ParallelOptions.MaxDegreesOfParallelism that throttles the concurrency.
var options = new ParallelOptions();
options.MaxDegreeOfParallelism = 3;
Parallel.ForEach(workitems, options,
(item) =>
{
ProcessWorkItem(item);
});
1I should point out that I do not like blocking ThreadPool threads using a Semaphore or any blocking device. It basically wastes the threads. You might want to rethink your design entirely.

You should use Semaphore object to limit concurent threads.

You say the files are open: are they actually being actively processed, or just left open?
If you're leaving them open: Been there, done that! Relying on connections and resources (it was a DB connection in my case) to close at end of scope should work, but it can take for the dispose / garbage collection to kick in.

c#: what is a thread polling?

What does it mean when one says no polling is allowed when implimenting your thread solution since it's wasteful, it has latency and it's non-deterministic. Threads should not use polling to signal each other.
EDIT
Based on your answers so far, I believe my threading implementation (taken from: http://www.albahari.com/threading/part2.aspx#_AutoResetEvent) below is not using polling. Please correct me if I am wrong.
using System;
using System.Threading;
using System.Collections.Generic;
class ProducerConsumerQueue : IDisposable {
EventWaitHandle _wh = new AutoResetEvent (false);
Thread _worker;
readonly object _locker = new object();
Queue<string> _tasks = new Queue<string>();
public ProducerConsumerQueue() (
_worker = new Thread (Work);
_worker.Start();
}
public void EnqueueTask (string task) (
lock (_locker) _tasks.Enqueue (task);
_wh.Set();
}
public void Dispose() (
EnqueueTask (null); // Signal the consumer to exit.
_worker.Join(); // Wait for the consumer's thread to finish.
_wh.Close(); // Release any OS resources.
}
void Work() (
while (true)
{
string task = null;
lock (_locker)
if (_tasks.Count > 0)
{
task = _tasks.Dequeue();
if (task == null) return;
}
if (task != null)
{
Console.WriteLine ("Performing task: " + task);
Thread.Sleep (1000); // simulate work...
}
else
_wh.WaitOne(); // No more tasks - wait for a signal
}
}
}

Your question is very unclear, but typically "polling" refers to periodically checking for a condition, or sampling a value. For example:
while (true)
{
Task task = GetNextTask();
if (task != null)
{
task.Execute();
}
else
{
Thread.Sleep(5000); // Avoid tight-looping
}
}
Just sleeping is a relatively inefficient way of doing this - it's better if there's some coordination so that the thread can wake up immediately when something interesting happens, e.g. via Monitor.Wait/Pulse or Manual/AutoResetEvent... but depending on the context, that's not always possible.
In some contexts you may not want the thread to actually sleep - you may want it to become available for other work. For example, you might use a Timer of one sort or other to periodically poll a mailbox to see whether there's any incoming mail - but you don't need the thread to actually be sleeping when it's not checking; it can be reused by another thread-pool task.

Here you go: check out this website:
http://msdn.microsoft.com/en-us/library/dsw9f9ts%28VS.71%29.aspx
Synchronization Techniques
There are two approaches to synchronization, polling and using synchronization objects. Polling repeatedly checks the status of an asynchronous call from within a loop. Polling is the least efficient way to manage threads because it wastes resources by repeatedly checking the status of the various thread properties.
For example, the IsAlive property can be used when polling to see if a thread has exited. Use this property with caution because a thread that is alive is not necessarily running. You can use the thread's ThreadState property to get more detailed information about a thread's status. Because threads can be in more than one state at any given time, the value stored in ThreadState can be a combination of the values in the System.Threading.Threadstate enumeration. Consequently, you should carefully check all relevant thread states when polling. For example, if a thread's state indicates that it is not Running, it may be done. On the other hand, it may be suspended or sleeping.
Waiting for a Thread to Finish
The Thread.Join method is useful for determining if a thread has completed before starting another task. The Join method waits a specified amount of time for a thread to end. If the thread ends before the timeout, Join returns True; otherwise it returns False. For information on Join, see Thread.Join Method
Polling sacrifices many of the advantages of multithreading in return for control over the order that threads run. Because it is so inefficient, polling generally not recommended. A more efficient approach would use the Join method to control threads. Join causes a calling procedure to wait either until a thread is done or until the call times out if a timeout is specified. The name, join, is based on the idea that creating a new thread is a fork in the execution path. You use Join to merge separate execution paths into a single thread again
One point should be clear: Join is a synchronous or blocking call. Once you call Join or a wait method of a wait handle, the calling procedure stops and waits for the thread to signal that it is done.
Copy
Sub JoinThreads()
Dim Thread1 As New System.Threading.Thread(AddressOf SomeTask)
Thread1.Start()
Thread1.Join() ' Wait for the thread to finish.
MsgBox("Thread is done")
End Sub
These simple ways of controlling threads, which are useful when you are managing a small number of threads, are difficult to use with large projects. The next section discusses some advanced techniques you can use to synchronize threads.
Hope this helps.
PK

Polling can be used in reference to the four asyncronous patterns .NET uses for delegate execution.
The 4 types (I've taken these descriptions from this well explained answer) are:
Polling: waiting in a loop for IAsyncResult.Completed to be true
I'll call you
You call me
I don't care what happens (fire and forget)
So for an example of 1:
Action<IAsyncResult> myAction = (IAsyncResult ar) =>
{
// Send Nigerian Prince emails
Console.WriteLine("Starting task");
Thread.Sleep(2000);
// Finished
Console.WriteLine("Finished task");
};
IAsyncResult result = myAction.BeginInvoke(null,null,null);
while (!result.IsCompleted)
{
// Do something while you wait
Console.WriteLine("I'm waiting...");
}
There's alternative ways of polling, but in general it means "I we there yet", "I we there yet", "I we there yet"

What does it mean when one says no
polling is allowed when implimenting
your thread solution since it's
wasteful, it has latency and it's
non-deterministic. Threads should not
use polling to signal each other.
I would have to see the context in which this statement was made to express an opinion on it either way. However, taken as-is it is patently false. Polling is a very common and very accepted strategy for signaling threads.
Pretty much all lock-free thread signaling strategies use polling in some form or another. This is clearly evident in how these strategies typically spin around in a loop until a certain condition is met.
The most frequently used scenario is the case of signaling a worker thread that it is time to terminate. The worker thread will periodically poll a bool flag at safe points to see if a shutdown was requested.
private volatile bool shutdownRequested;
void WorkerThread()
{
while (true)
{
// Do some work here.
// This is a safe point so see if a shutdown was requested.
if (shutdownRequested) break;
// Do some more work here.
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.