Using thread-local resources with parallel for loops - c#

Imagine a long list of data to be processed. Processing is CPU bound and can be done in parallel.
To process a data item requires a large object (~50MB) to hold intermediate processing results. This object may be re-used during the processing of a subsequent task.
I want to do something like this:
Processor[] processors = GetProcessors(Environment.ProcessorCount);
Parallel.For(
0,
itemCount,
new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
item =>
{
int threadIndex = /* TODO */;
processors[threadIndex].Process(item);
}
);
The goal is to only ever have Environment.ProcessorCount instances of my large object, and reuse them as efficiently as possible.
How can I do this?

You just need to use the overload of Parallel.For that takes in the two functions to set up and tear down your thread local object.
Parallel.For(
0,
itemCount,
() => new Processor(),
new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
(item, loopState, processor) =>
{
processor.Process(item);
// return the processor to be used for another invocation
return processor;
}
processor =>
{
//Do any tear down work you need to do, like dispose the object if it is disposeable
processor.Dispose();
}
);
Because the Parallel class function do not immediately jump to using ParallelOptions.MaxDegreeOfParallelism threads (they start at one thread then ramp up to the max you defined) it will only create one instance of Processor if only one thread gets created and have up to ParallelOptions.MaxDegreeOfParallelism objects created at the same time.
I do not know the implementation details of the default scheduler but it may or may not stop threads then create new ones causing a new Processor object to be created. However, if that happens (which it may not, I don't know) you will still only have a maximum of ParallelOptions.MaxDegreeOfParallelism objects existing at the same time.

Here's an approach that works. I devised this answer while writing the question, but thought the question was interesting enough to post anyway. If someone has a better solution, I'd like to learn it.
Use a concurrent collection (such as ConcurrentQueue<Processor>) to allocate instances of the Processor between threads.
Processor[] processors = GetProcessors(Environment.ProcessorCount);
var queue = new ConcurrentQueue<Processor>(processors);
Parallel.For(
0,
itemCount,
new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
item =>
{
// Obtain the processor
Processor processor;
queue.TryDequeue(out processor);
processor.Process(item);
// Store the processor again for another invocation
queue.Enqueue(processor);
}
);
An actual implementation should assert that TryDequeue returns true, and also enqueue the processors again in case of an exception.
So long as the processing time is much larger than the time spent in queue contention, the overhead should be minimal.

Related

Is it the correct implementation?

I am having a Windows Service that needs to pick the jobs from database and needs to process it.
Here, each job is a scanning process that would take approx 10 mins to complete.
I am very new to Task Parallel Library. I have implemented in the following way as sample logic:
Queue queue = new Queue();
for (int i = 0; i < 10000; i++)
{
queue.Enqueue(i);
}
for (int i = 0; i < 100; i++)
{
Task.Factory.StartNew((Object data ) =>
{
var Objdata = (Queue)data;
Console.WriteLine(Objdata.Dequeue());
Console.WriteLine(
"The current thread is " + Thread.CurrentThread.ManagedThreadId);
}, queue, TaskCreationOptions.LongRunning);
}
Console.ReadLine();
But, this is creating lot of threads. Since loop is repeating 100 times, it is creating 100 threads.
Is it right approach to create that many number of parallel threads ?
Is there any way to limit the number of threads to 10 (concurrency level)?
An important factor to remember when allocating new Threads is that the OS has to allocate a number of logical entities in order for that current thread to run:
Thread kernel object - an object for describing the thread,
including the thread's context, cpu registers, etc
Thread environment block - For exception handling and thread local
storage
User-mode stack - 1MB of stack
Kernel-mode stack - For passing arguments from user mode to kernel
mode
Other than that, the number of concurrent Threads that may run depend on the number of cores your machine is packing, and creating an amount of threads that is larger than the number of cores your machine owns will start causing Context Switching, which in the long run may slow your work down.
So after the long intro, to the good stuff. What we actually want to do is limit the number of threads running and reuse them as much as possible.
For this kind of job, i would go with TPL Dataflow which is based on the Producer-Consumer pattern. Just a small example of what can be done:
// a BufferBlock is an equivalent of a ConcurrentQueue to buffer your objects
var bufferBlock = new BufferBlock<object>();
// An ActionBlock to process each object and do something with it
var actionBlock = new ActionBlock<object>(obj =>
{
// Do stuff with the objects from the bufferblock
});
bufferBlock.LinkTo(actionBlock);
bufferBlock.Completion.ContinueWith(t => actionBlock.Complete());
You may pass each Block a ExecutionDataflowBlockOptions which may limit the Bounded Capacity (The number of objects inside the BufferBlock) and MaxDegreeOfParallelism which tells the block the number of maximum concurrency you may want.
There is a good example here to get you started.
Glad you asked, because you're right in the sense that - this is not the best approach.
The concept of Task should not be confused with a Thread. A Thread can be compared to a chef in a kitchen, while a Task is a dish ordered by a customer. You have a bunch of chefs, and they process the dish orders in some ordering (usually FIFO). A chef finishes a dish then moves on to the next. The concept of Thread Pool is the same. You create a bunch of Tasks to be completed, but you do not need to assign a new thread to each task.
Ok so the actual bits to do it. There are a few. The first one is ThreadPoll.QueueUserWorkItem. (http://msdn.microsoft.com/en-us/library/system.threading.threadpool.queueuserworkitem(v=vs.110).aspx). Using the Parallel library, Parallel.For can also be used, it will automatically spawn threads based on the number of actual CPU cores available in the system.
Parallel.For(0, 100, i=>{
//here, this method will be called 100 times, and i will be 0 to 100
WaitForGrassToGrow();
Console.WriteLine(string.Format("The {0}-th task has completed!",i));
});
Note that there is no guarantee that the method called by Parallel.For is called in sequence (0,1,2,3,4,5...). The actual sequence depends on the execution.

N number of threads asynchronously getting/executing tasks

I have an unlimited number of tasks in a db queue somewhere. What is the best way to have a program working on n tasks simultaneously on n different threads, starting new tasks as old ones get done? When one task finishes, another task should asynchronously begin. The currently-running count should always be n.
My initial thought was to use a thread pool, but that seems unnecessary considering that the tasks to be worked on will be retrieved within the individual threads. In other words, each thread will on its own go get its next task rather than having a main thread get tasks and then distribute them.
I see multiple options for doing this, and I don't know which one I should use for optimal performance.
1) Thread Pool - In light of there not necessarily being any waiting threads, I'm not sure this is necessary.
2) Semaphore - Same as 1. What's the benefit of a semaphore if there aren't tasks waiting to be allocated by the main thread?
3) Same Threads Forever - Kick the program off with n threads. When a thread is done working, it gets the next task itself. The main thread just monitors to makes sure the n threads are still alive.
4) Event Handling - Same as 3, except that when a thread finishes a task, it fires off an ImFinished event before dying. An ImFinished event handler kicks off a new thread. This seems just like 3 but with more overhead (since new threads are constantly being created)
5) Something else?
BlockingCollection makes this whole thing pretty trivial:
var queue = new BlockingCollection<Action>();
int numWorkers = 5;
for (int i = 0; i < numWorkers; i++)
{
Thread t = new Thread(() =>
{
foreach (var action in queue.GetConsumingEnumerable())
{
action();
}
});
t.Start();
}
You can then have the main thread add items to the blocking collection after starting the workers (or before, if you want). You can even spawn multiple producer threads to add items to the queue.
Note that the more conventional approach would be to use Tasks instead of using Thread classes directly. The primary reasons that I didn't suggest it first is that you specifically requested an exact number of threads to be running (rather than a maximum) and you just don't have as much control over how Task objects are run (which is good; they can be optimized on your behalf). If that control isn't as important as you have stated the following may end up being preferable:
var queue = new BlockingCollection<Action>();
int numWorkers = 5;
for (int i = 0; i < numWorkers; i++)
{
Task.Factory.StartNew(() =>
{
foreach (var action in queue.GetConsumingEnumerable())
{
action();
}
}, CancellationToken.None, TaskCreationOptions.LongRunning, TaskScheduler.Default);
}
I like model #3, and have used it before; it reduces the number of threads starting and stopping, and makes the main thread a true "supervisor", reducing the work it has to do.
As Servy has indicated, the System.Collections.Concurrent namespace has a few constructs that are extremely valuable here. ConcurrentQueue is a thread-safe FIFO collection implementation designed to be used in just such a model; one or more "producer" threads add elements to the "input" side of the queue, while one or more "consumers" take elements out of the other end. If there is nothing in the queue, the call to get the item simply returns false; you can react to that by exiting out of the task method (the supervisor can then decide whether to start another task, probably by monitoring the input to the queue and ramping up when more items come in).
BlockingCollection adds the behavior of causing threads to wait when they attempt to get a value from the queue, if the queue doesn't have anything. It can also be configured to have a maximum capacity, above which it will block the "producer" threads adding any more elements until there is available capacity. BlockingCollection uses a ConcurrentQueue by default, but you can set it up to be a Stack, Dictionary or Bag if you wish. Using this model, you can have the tasks run indefinitely; when there's nothing to do they'll simply block until there is something for at least one of them to work on, so all the supervisor has to check for is tasks erroring out (a critical element of any robust threaded workflow pattern).
This is easily achieved with the TPL Dataflow library.
First, let's assume you have a BufferBlock<T>, this is your queue:
var queue = new BufferBlock<T>();
Then, you need the action to perform on the block, this is represented by the ActionBlock<T> class:
var action = new ActionBlock<T>(t => { /* Process t here */ },
new ExecutionDataflowBlockOptions {
// Number of concurrent tasks.
MaxDegreeOfParallelism = ...,
});
Note the constructor above, it takes an instance of ExecutionDataflowBlockOptions and sets the MaxDegreeOfParallelism property to however many concurrent items you want to be processed at the same time.
Underneath the surface, the Task Parallel Library is being used to handle allocating threads for tasks, etc. TPL Dataflow is meant to be a higher level abstraction which allows you to tweak just how much parallelism/throttling/etc that you want.
For example, if you didn't want the ActionBlock<TInput> to buffer any items (preferring them to live in the BufferBlock<T>), you can also set the BoundedCapacity property, which will limit the number of items that the ActionBlock<TInput> will hold onto at once (which includes the number of items being processed, as well as reserved items):
var action = new ActionBlock<T>(t => { /* Process t here */ },
new ExecutionDataflowBlockOptions {
// Number of concurrent tasks.
MaxDegreeOfParallelism = ...,
// Set to MaxDegreeOfParallelism to not buffer.
BoundedCapacity ...,
});
Also, if you want a new, fresh Task<TResult> instance to process every item, then you can set the MaxMessagesPerTask property to one, indicating that each and every Task<TResult> will process one item:
var action = new ActionBlock<T>(t => { /* Process t here */ },
new ExecutionDataflowBlockOptions {
// Number of concurrent tasks.
MaxDegreeOfParallelism = ...,
// Set to MaxDegreeOfParallelism to not buffer.
BoundedCapacity ...,
// Process once item per task.
MaxMessagesPerTask = 1,
});
Note that depending on how many other tasks your application is running, this might or might not be optimal for you, and you might also want to think of the cost of spinning up a new task for every item that comes through the ActionBlock<TInput>.
From there, it's a simple matter of linking the BufferBlock<T> to the ActionBlock<TInput> with a call to the LinkTo method:
IDisposable connection = queue.LinkTo(action, new DataflowLinkOptions {
PropagateCompletion = true;
});
You set the PropogateCompletion property to true here so that when waiting on the ActionBlock<T>, the completion will be sent to the ActionBlock<T> (if/when there are no more items to process) which you might subsequently wait on.
Note the you can call the Dispose method on the IDisposable interface implementation returned from the call to LinkTo if you want the link between the blocks to be removed.
Finally, you post items to the buffer using the Post method:
queue.Post(new T());
And when you're done (if you are ever done), you call the Complete method:
queue.Complete();
Then, on the action block, you can wait until it's done by waiting on the Task instance exposed by the Completion property:
action.Completion.Wait();
Hopefully, the elegance of this is clear:
You don't have to manage the creation of new Task instances/threads/etc to manage the work, the blocks do it for you based on the settings you provide (and this is on a per-block basis).
Cleaner separation of concerns. The buffer is separated from the action, as are all the other blocks. You build the blocks and then link them together.
I'm a VB guy, but you can easily translate:
Private Async Sub foo()
Dim n As Integer = 16
Dim l As New List(Of Task)
Dim jobs As New Queue(Of Integer)(Enumerable.Range(1, 100))
For i = 1 To n
Dim j = jobs.Dequeue
l.Add(Task.Run((Sub()
Threading.Thread.Sleep(500)
Console.WriteLine(j)
End Sub)))
Next
While l.Count > 0
Dim t = Await Task.WhenAny(l)
If jobs.Count > 0 Then
Dim j = jobs.Dequeue
l(l.IndexOf(t)) = (Task.Run((Sub()
Threading.Thread.Sleep(500)
Console.WriteLine(j)
End Sub)))
Else
l.Remove(t)
End If
End While
End Sub
There's an article from Stephen Toub, why you shouldn't use Task.WhenAny in this way ... WITH A LARGE LIST OF TASKS, but with "some" tasks you usually dont run into a problem
The idea is quite simple: You have a list, where you add as many (running) tasks as you would like to run in parallel. Then you (a)wait for the first one to finish. If there are still jobs in the queue, you assign the job to a new task and then (a)wait again. If there are no jobs in the queue, you simply remove the finished task. If both your tasklist and the queue is empty, you are done.
The Stephen Toub article: http://blogs.msdn.com/b/pfxteam/archive/2012/08/02/processing-tasks-as-they-complete.aspx

Parallel tasks with a long pause

I have a function which is along the lines of
private void DoSomethingToFeed(IFeed feed)
{
feed.SendData(); // Send data to remote server
Thread.Sleep(1000 * 60 * 5); // Sleep 5 minutes
feed.GetResults(); // Get data from remote server after it's processed it
}
I want to parallelize this, since I have lots of feeds that are all independent of each other. Based on this answer, leaving the Thread.Sleep() in there is not a good idea. I also want to wait after all the threads have spun up, until they've all had a chance to get their results.
What's the best way to handle a scenario like this?
Edit, because I accidentally left it out: I had originally considered calling this function as Parallel.ForEach(feeds, DoSomethingToFeed), but I was wondering if there was a better way to handle the sleeping when I found the answer I linked to.
Unless you have an awful lot of threads, you can keep it simple. Create all the threads. You'll get some thread creation overhead, but since the threads are basically sleeping the whole time, you won't get too much context switching.
It'll be easier to code than any other solution (unless you're using C# 5). So start with that, and improve it only if you actually see a performance problem.
I think you should take a look at the Task class in .NET. It is a nice abstraction on top of more low level threading / thread pool management.
In order to wait for all tasks to complete, you can use Task.WaitAll.
An example use of Tasks could look like:
IFeed feedOne = new SomeFeed();
IFeed feedTwo = new SomeFeed();
var t1 = Task.Factory.StartNew(() => { feedOne.SendData(); });
var t2 = Task.Factory.StartNew(() => { feedTwo.SendData(); });
// Waits for all provided tasks to finish execution
Task.WaitAll(t1, t2);
However, another solution would be using Parallel.ForEach which handles all Task creation for you and does the appropriate batching of tasks as well. A good comparison of the two approaches is given here - where it, among other good points is stated that:
Parallel.ForEach, internally, uses a Partitioner to distribute your collection into work items. It will not do one task per item, but rather batch this to lower the overhead involved.
check WaitHandle for waiting on tasks.
private void DoSomethingToFeed(IFeed feed)
{
Task.Factory.StartNew(() => feed.SendData())
.ContinueWith(_ => Delay(1000 * 60 * 5)
.ContinueWith(__ => feed.GetResults())
);
}
//http://stevenhollidge.blogspot.com/2012/06/async-taskdelay.html
Task Delay(int milliseconds)
{
var tcs = new TaskCompletionSource<object>();
new System.Threading.Timer(_ => tcs.SetResult(null)).Change(milliseconds, -1);
return tcs.Task;
}

ThreadPool frustrations - Thread creation exceeding SetMaxThreads

I've got an I/O intensive operation.
I only want a MAX of 5 threads ever running at one time.
I've got 8000 tasks to queue and complete.
Each task takes approximately 15-20seconds to execute.
I've looked around at ThreadPool, but
ThreadPool.SetMaxThreads(5, 0);
List<task> tasks = GetTasks();
int toProcess = tasks.Count;
ManualResetEvent resetEvent = new ManualResetEvent(false);
for (int i = 0; i < tasks.Count; i++)
{
ReportGenerator worker = new ReportGenerator(tasks[i].Code, id);
ThreadPool.QueueUserWorkItem(x =>
{
worker.Go();
if (Interlocked.Decrement(ref toProcess) == 0)
resetEvent.Set();
});
}
resetEvent.WaitOne();
I cannot figure out why... my code is executing more than 5 threads at one time. I've tried to setmaxthreads, setminthreads, but it keeps executing more than 5 threads.
What is happening? What am I missing? Should I be doing this in another way?
Thanks
There is a limitation in SetMaxThreads in that you can never set it lower than the number of processors on the system. If you have 8 processors, setting it to 5 is the same as not calling the function at all.
Task Parallel Library can help you:
List<task> tasks = GetTasks();
Parallel.ForEach(tasks, new ParallelOptions { MaxDegreeOfParallelism = 5 },
task => {ReportGenerator worker = new ReportGenerator(task.Code, id);
worker.Go();});
What does MaxDegreeOfParallelism do?
I think there's a different and better way to approach this. (Pardon me if I accidentally Java-ize some of the syntax)
The main thread here has a lists of things to do in "Tasks" -- instead of creating threads for each task, which is really not efficient when you have so many items, create the desired number of threads and then have them request tasks from the list as needed.
The first thing to do is add a variable to the class this code comes from, for use as a pointer into the list. We'll also add one for the maximum desired thread count.
// New variable in your class definition
private int taskStackPointer;
private final static int MAX_THREADS = 5;
Create a method that returns the next task in the list and increments the stack pointer. Then create a new interface for this:
// Make sure that only one thread has access at a time
[MethodImpl(MethodImplOptions.Synchronized)]
public task getNextTask()
{
if( taskStackPointer < tasks.Count )
return tasks[taskStackPointer++];
else
return null;
}
Alternately, you could return tasks[taskStackPointer++].code, if there's a value you can designate as meaning "end of list". Probably easier to do it this way, however.
The interface:
public interface TaskDispatcher
{
[MethodImpl(MethodImplOptions.Synchronized)] public task getNextTask();
}
Within the ReportGenerator class, change the constructor to accept the dispatcher object:
public ReportGenerator( TaskDispatcher td, int idCode )
{
...
}
You'll also need to alter the ReportGenerator class so that the processing has an outer loop that starts off by calling td.getNextTask() to request a new task, and which exits the loop when it gets back a NULL.
Finally, alter the thread creation code to something like this: (this is just to give you an idea)
taskStackPointer = 0;
for (int i = 0; i < MAX_THREADS; i++)
{
ReportGenerator worker = new ReportGenerator(this,id);
worker.Go();
}
That way you create the desired number of threads and keep them all working at max capacity.
(I'm not sure I got the usage of "[MethodImpl(MethodImplOptions.Synchronized)]" exactly right... I am more used to Java than C#)
Your tasks list will have 8k items in it because you told the code to put them there:
List<task> tasks = GetTasks();
That said, this number has nothing to do with how many threads are being used in the sense that the debugger is always going to show how many items you added to the list.
There are various ways to determine how many threads are in use. Perhaps one of the simplest is to break into the application with the debugger and take a look at the threads window. Not only will you get a count, but you'll see what each thread is doing (or not) which leads me to...
There is significant discussion to be had about what your tasks are doing and how you arrived at a number to 'throttle' the thread pool. In most use cases, the thread pool is going to do the right thing.
Now to answer your specific question...
To explicitly control the number of concurrent tasks, consider a trivial implementation that would involve changing your task collection from a List to BlockingCollection (that will internally use a ConcurrentQueue) and the following code to 'consume' the work:
var parallelOptions = new ParallelOptions
{
MaxDegreeOfParallelism = 5
};
Parallel.ForEach(collection.GetConsumingEnumerable(), options, x =>
{
// Do work here...
});
Change MaxDegreeOfParallelism to whatever concurrent value you have determined is appropriate for the work you are doing.
The following might be of interest to you:
Parallel.ForEach Method
BlockingCollection
Chris
Its works for me. This way you can't use a number of workerthreads smaller than "minworkerThreads". The problem is if you need five "workerthreads" maximum and the "minworkerThreads" is six doesn't work.
{
ThreadPool.GetMinThreads(out minworkerThreads,out minportThreads);
ThreadPool.SetMaxThreads(minworkerThreads, minportThreads);
}
MSDN
Remarks
You cannot set the maximum number of worker threads or I/O completion threads to a number smaller than the number of processors on the computer. To determine how many processors are present, retrieve the value of the Environment.ProcessorCount property. In addition, you cannot set the maximum number of worker threads or I/O completion threads to a number smaller than the corresponding minimum number of worker threads or I/O completion threads. To determine the minimum thread pool size, call the GetMinThreads method.
If the common language runtime is hosted, for example by Internet Information Services (IIS) or SQL Server, the host can limit or prevent changes to the thread pool size.
Use caution when changing the maximum number of threads in the thread pool. While your code might benefit, the changes might have an adverse effect on code libraries you use.
Setting the thread pool size too large can cause performance problems. If too many threads are executing at the same time, the task switching overhead becomes a significant factor.

Using Parallelization for relatively large loops

I have an 8-core CPU machine with 8 GB memory. Logically the following code can be done in parallel, but since the loop exposes more than enough opportunities for parallelism since I have far fewer cores available than the size of the loop. Second, every delegate expression allocates some memory to hold the free variables.
Is it recommended to use parallel for in this case?
also will separating the 2 parallel for's into 2 task improve the performance in this case??
private static void DoWork()
{
int end1 = 100; // minimum of 100 values;
int end2 = 100; // minimum of 100 values;
Task a = Task.Factory.StartNew(
delegate
{
Parallel.For(0, end1, delegate(int i)
{
// independent work
});
}
);
Task b = Task.Factory.StartNew(
delegate
{
Parallel.For(0, end2, delegate(int i)
{
// independent work
});
}
);
a.Wait();
b.Wait();
}
also will separating the 2 parralel for's into 2 task improve the performance in this case??
Not noticeably, and you could easily harm performance.
The TPL is especially designed to provide load balancing, let it do its job.
The main points here that are your concern:
the 'work' should really be independent
the 'work' should be non-trivial, ie computationally intensive and considerably more than just adding a few numbers
the 'work' should avoid I/O (as much as possible)
Leave the load balancing to the framework by calling "Partitioner.Create".
Try creating a ParallelOptions object and pass it to Parallel.For. Try out with different MaxDegreeOfParallelism and tune your code based on the results, this number can be more than the no. of cores in your system. This has worked for me.

Categories