Is there any change that a multiple Background Workers perform better than Tasks on 5 second running processes? I remember reading in a book that a Task is designed for short running processes.
The reasong I ask is this:
I have a process that takes 5 seconds to complete, and there are 4000 processes to complete. At first I did:
for (int i=0; i<4000; i++) {
Task.Factory.StartNewTask(action);
}
and this had a poor performance (after the first minute, 3-4 tasks where completed, and the console application had 35 threads). Maybe this was stupid, but I thought that the thread pool will handle this kind of situation (it will put all actions in a queue, and when a thread is free, it will take an action and execute it).
The second step now was to do manually Environment.ProcessorCount background workers, and all the actions to be placed in a ConcurentQueue. So the code would look something like this:
var workers = new List<BackgroundWorker>();
//initialize workers
workers.ForEach((bk) =>
{
bk.DoWork += (s, e) =>
{
while (toDoActions.Count > 0)
{
Action a;
if (toDoActions.TryDequeue(out a))
{
a();
}
}
}
bk.RunWorkerAsync();
});
This performed way better. It performed much better then the tasks even when I had 30 background workers (as much tasks as in the first case).
LE:
I start the Tasks like this:
public static Task IndexFile(string file)
{
Action<object> indexAction = new Action<object>((f) =>
{
Index((string)f);
});
return Task.Factory.StartNew(indexAction, file);
}
And the Index method is this one:
private static void Index(string file)
{
AudioDetectionServiceReference.AudioDetectionServiceClient client = new AudioDetectionServiceReference.AudioDetectionServiceClient();
client.IndexCompleted += (s, e) =>
{
if (e.Error != null)
{
if (FileError != null)
{
FileError(client,
new FileIndexErrorEventArgs((string)e.UserState, e.Error));
}
}
else
{
if (FileIndexed != null)
{
FileIndexed(client, new FileIndexedEventArgs((string)e.UserState));
}
}
};
using (IAudio proxy = new BassProxy())
{
List<int> max = new List<int>();
if (proxy.ReadFFTData(file, out max))
{
while (max.Count > 0 && max.First() == 0)
{
max.RemoveAt(0);
}
while (max.Count > 0 && max.Last() == 0)
{
max.RemoveAt(max.Count - 1);
}
client.IndexAsync(max.ToArray(), file, file);
}
else
{
throw new CouldNotIndexException(file, "The audio proxy did not return any data for this file.");
}
}
}
This methods reads from an mp3 file some data, using the Bass.net library. Then that data is sent to a WCF service, using the async method.
The IndexFile(string file) method, which creates tasks is called for 4000 times in a for loop.
Those two events, FileIndexed and FileError are not handled, so they are never thrown.
The reason why the performance for Tasks was so poor was because you mounted too many small tasks (4000). Remember the CPU needs to schedule the tasks as well, so mounting a lots of short-lived tasks causes extra work load for CPU. More information can be found in the second paragraph of TPL:
Starting with the .NET Framework 4, the TPL is the preferred way to
write multithreaded and parallel code. However, not all code is
suitable for parallelization; for example, if a loop performs only a
small amount of work on each iteration, or it doesn't run for many
iterations, then the overhead of parallelization can cause the code to
run more slowly.
When you used the background workers, you limited the number of possible alive threads to the ProcessCount. Which reduced a lot of scheduling overhead.
Given that you have a strictly defined list of things to do, I'd use the Parallel class (either For or ForEach depending on what suits you better). Furthermore you can pass a configuration parameter to any of these methods to control how many tasks are actually performed at the same time:
System.Threading.Tasks.Parallel.For(0, 20000, new ParallelOptions() { MaxDegreeOfParallelism = 5 }, i =>
{
//do something
});
The above code will perform 20000 operations, but will NOT perform more than 5 operations at the same time.
I SUSPECT the reason the background workers did better for you was because you had them created and instantiated at the start, while in your sample Task code it seems you're creating a new Task object for every operation.
Alternatively, did you think about using a fixed number of Task objects instantiated at the start and then performing a similar action with a ConcurrentQueue like you did with the background workers? That should also prove to be quite efficient.
Have you considered using threadpool?
http://msdn.microsoft.com/en-us/library/system.threading.threadpool.aspx
If your performance is slower when using threads, it can only be due to threading overhead (allocating and destroying individual threads).
Related
Edit: As per the discussion in the comments, I was overestimating how much many threads would help, and have gone back to Parallell.ForEach with a reasonable MaxDegreeOfParallelism, and just have to wait it out.
I have a 2D array data structure, and perform work on slices of the data. There will only ever be around 1000 threads required to work on all the data simultaneously. Basically there are around 1000 "days" worth of data for all ~7000 data points, and I would like to process the data for each day in a new thread in parallel.
My issue is that doing work in the child threads dramatically slows the time in which the main thread starts them. If I have no work being done in the child threads, the main thread starts them all basically instantly. In my example below, with just a bit of work, it takes ~65ms to start all the threads. In my real use case, the worker threads will take around 5-10 seconds to compute all what they need, but I would like them all to start instantly otherwise, I am basically running the work in sequence. I do not understand why their work is slowing down the main thread from starting them.
How the data is setup shouldn't matter (I hope). The way it's setupmight look weird I was just simulating exactly how I receive the data. What's important is that if you comment out the foreach loop in the DoThreadWork method, the time it takes to start the threads is waaay lower.
I have the for (var i = 0; i < 4; i++) loop just to run the simulation multiple times to see 4 sets of timing results to make sure that it wasn't just slow the first time.
Here is a code snippet to simulate my real code:
public static void Main(string[] args)
{
var fakeData = Enumerable
.Range(0, 7000)
.Select(_ => Enumerable.Range(0, 400).ToArray())
.ToArray();
const int offset = 100;
var dataIndices = Enumerable
.Range(offset, 290)
.ToArray();
for (var i = 0; i < 4; i++)
{
var s = Stopwatch.StartNew();
var threads = dataIndices
.Select(n =>
{
var thread = new Thread(() =>
{
foreach (var fake in fakeData)
{
var sliced = new ArraySegment<int>(fake, n - offset, n - (n - offset));
DoThreadWork(sliced);
}
});
return thread;
})
.ToList();
foreach (var thread in threads)
{
thread.Start();
}
Console.WriteLine($"Before Join: {s.Elapsed.Milliseconds}");
foreach (var thread in threads)
{
thread.Join();
}
Console.WriteLine($"After Join: {s.Elapsed.Milliseconds}");
}
}
private static void DoThreadWork(ArraySegment<int> fakeData)
{
// Commenting out this foreach loop will dramatically increase the speed
// in which all the threads start
var a = 0;
foreach (var fake in fakeData)
{
// Simulate thread work
a += fake;
}
}
Use the thread/task pool and limit thread/task count to 2*(CPU Cores) at most. Creating more threads doesn't magically make more work get done as you need hardware "threads" to run them (1 per CPU core for non-SMT CPU's, 2 per core for Intel HT, AMD's SMT implementation). Executing hundreds to thousands of threads that don't have to passively await asynchronous callbacks (i.e. I/O) makes running the threads far less efficient due to thrashing the CPU with context switches for no reason.
i have this part of code
bool hasData = true;
using (Context context = new Context())
{
using (SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(MAX_THREADS))
{
while (hasData)
{
Message message = context.Database.SqlQuery<Message>($#"
select top(1) * from Message where
( Status = {(int)MessageStatusEnum.Pending} ) or
( Status = {(int)MessageStatusEnum.Paused } and ResumeOn < GETUTCDATE() )
").FirstOrDefault();
if message == null)
{
hasData = false;
}
else
{
concurrencySemaphore.Wait();
tasks.Add(Task.Factory.StartNew(() =>
{
Process(message);
concurrencySemaphore.Release();
}, this.CancellationToken));
}
}
Task.WaitAll(tasks.ToArray());
}
}
And my Process function is something like this
private void Process(Message message)
{
System.Threading.Thread.Sleep(10000);
}
Now, if i have only 1 item that i want to Process then the total execution time is 10sec and the execution time per item(1 item) is 10 sec.
Well if i have 10 items for example, then the execution per item is increasing to 15-20sec.
I tried to change the value of MAX_THREADS but always if i have more than 10 items in my queue and start the parallel execution then the time of execution per item is about 15sec.
What i am missing?
Not unexpected. Parallel Slowdown is a very real problem. On the one end we have pleasingly parallel operations. On other end inherently serial stuff like calcuating the Fibonacci sequence. And there is a lot of area in between.
Multitasking in general and Multithreading in particular is not a magic bullet that makes everything faster. I like to say: Multitasking has to pick it's problems carefully. If you throw more tasks at a problem that is not designed for it, you end up with code that is:
more complex
more memory demanding
and most importanlty slower then the simple, sequential operation
At best you just added all that Task Management overhead to the operation. At worst you run into deadlocks and resource competition. And your example operation of Sleeping the Thread is not something that will benefit from Multitasking anyway. Please show us the actuall operation you want to speed up via Multitasking/-Threading. So we can tell you if there is any chance that can work out.
I have a current algorithm that goes like this.
public class Executor
{
private ParallelOptions options = new ParallelOptions();
private IList<Step> AllSteps;
public void Execute()
{
options.MaxDegreeOfParallelism = 4;
var rootSteps = AllSteps.Where(s => !s.Parents.Any());
Parallel.Foreach(rootSteps, options, RecursivelyExecuteStep);
}
private void RecursivelyExecuteStep(Step step)
{
ExecuteStep();
var childSteps = AllSteps.Where(s=>s.Parents.Contains(step)
&& step.Parents.All(p=>p.IsComplete);
Parallel.ForEach(childSteps, options, RecursivelyExecuteStep);
}
}
ParallelOptions.MaxDegreeOfParallelism will be an input variable (but left it out of the code example for brevity).
I was wondering if thread pooling is handled for me automatically or if this creates new threads every time. Also what's the best way to optimize this, is thread pooling something I want. How do I use thread pooling. I'm fairly new to multithreading and what's new in 4.5[.1]
Will this not limit the algorithm to only 4 threads because each Parallel.Foreach would have it's own MaxDegreeOfParallelism of 4 thus not limiting all the threads in the app to 4? How do I achieve limiting all threading in the app to 4?
Edit: MaxDegreeOfParallelism
You can solve this problem with TPL DataFlow library (you can get it via NuGet). As it is said in other answer, Parallel class is using the ThreadPool internally, and you should not be bothered with that.
With the TPL Dataflow the only thing you need is create an TransformManyBlock<TInput,TOutput> linked on itself (or link BufferBlock with ActionBlock with Encapsulate extension), and set the MaxDegreeOfParallelism = 4 or whatever constant you think it should be.
Parallel.Foreach basically is a nice way to queue up work items to the .NET ThreadPool.
Your application (process) has only one instance of the ThreadPool, and it tries to be as smart as possible regarding how many concurrent threads it uses, taking things like number of available cores and virtual memory into account.
So yes, the .NET ThreadPool handles thread pooling for you, and in many cases you don't need to worry about it, use Parallel.Foreach and let it get on with it.
EDIT: As noted by others, you should be careful in overusing the ThreadPool since it is a shared resource and it may disturb other parts of your application. It will also start creating new threads if your items are blocking or very long-running, which often is wasteful. A rule of thumb is that the work items should be relatively quick and preferably non-blocking. You should test and benchmark, and if it works for your use case then it is very convenient.
You can control the max number of concurrent threads used by the ThreadPool in your application if you want explicit control, by calling ThreadPool.SetMaxThreads. I'd advice against that unless you really have to though, and know what you are doing. The ThreadPool already tries to avoid using more concurrent threads than you have cores for example.
What you can do with ParallellOptions.MaxDegreeOfParallelism is only to further limit the number of concurrent ThreadPool threads that are used to execute that particular call to Parallel.Foreach.
If you need more explicit control of how many concurrent threads an invocation of your algorithm uses, here are some possible alternatives (in, arguably, increasing implementation complexity):
With the default ThreadPool you can't limit concurrency while calling Parellel.Foreach recursively. You could for example consider using Parallel.Foreach only on the top level (with a ParellelOptions.MaxDegreeOfParallelism argument) and let RecursivelyExecuteStep use a standard foreach.
Modify (or replace) the ThreadPool for your algorithm to limit concurrency by setting ParallelOptions.TaskScheduler to an instance of QueuedTaskScheduler from Parallel Extension Extras as described here.
As suggested by #VMAtm, you can use TPL Dataflow to get more
explicit control of how your computations are performed, including
concurrency (this can also be combined with a custom task scheduler if you
really want to knock yourself out).
A simple straightforward implementation could look like the following:
ParallelOptions Options = new ParallelOptions{MaxDegreeOfParallelism = 4};
IList<Step> AllSteps;
public void Execute()
{
var RemainingSteps = new HashSet<Step>(AllSteps);
while(RemainingSteps.Count > 0)
{
var ExecutableSteps = RemainingSteps.Where(s => s.Parents.All(p => p.IsComplete)).ToList();
Parallel.ForEach(ExecutableSteps, Options, ExecuteStep);
RemainingSteps.ExceptWith(ExecutableSteps);
}
}
Granted, this will execute steps in phases, so you will not always have maximum concurrency. You may only be executing one step at the end of each phase, since the next steps to execute are only realized after all steps in the current phase complete.
If you want to improve concurrency, I would suggest using a BlockingCollection. You'll need to implement a custom partitioner to use Parallel.ForEach against the blocking collection in this case. You'll also want a concurrent collection of the remaining steps, so that you don't queue the same step multiple times (the race condition previously commented on).
public class Executor
{
ParallelOptions Options = new ParallelOptions() { MaxDegreeOfParallelism = 4 };
IList<Step> AllSteps;
//concurrent hashset of remaining steps (used to prevent race conditions)
ConcurentDictionary<Step, Step> RemainingSteps = new ConcurentDictionary<Step, Step>();
//blocking collection of steps that can execute next
BlockingCollection<Step> ExecutionQueue = new BlockingCollection<Step>();
public void Execute()
{
foreach(var step in AllSteps)
{
if(step.Parents.All(p => p.IsComplete))
{
ExecutionQueue.Add(step);
}
else
{
RemainingSteps.Add(step, step);
}
}
Parallel.ForEach(
GetConsumingPartitioner(ExecutionQueue),
Options,
Execute);
}
void Execute(Step step)
{
ExecuteStep(step);
if(RemainingSteps.IsEmpty)
{
//we're done, all steps are complete
executionQueue.CompleteAdding();
return;
}
//queue up the steps that can execute next (concurrent dictionary enumeration returns a copy, so subsequent removal is safe)
foreach(var step in RemainingSteps.Values.Where(s => s.Parents.All(p => p.IsComplete)))
{
//note, removal only occurs once, so this elimiates the race condition
Step NextStep;
if(RemainingSteps.TryRemove(step, out NextStep))
{
executionQueue.Add(NextStep);
}
}
}
Partitioner<T> GetConsumingPartitioner<T>(BlockingCollection<T> collection)
{
return new BlockingCollectionPartitioner<T>(collection);
}
class BlockingCollectionPartitioner<T> : Partitioner<T>
{
readonly BlockingCollection<T> Collection;
public BlockingCollectionPartitioner(BlockingCollection<T> collection)
{
if (collection == null) throw new ArgumentNullException("collection");
Collection = collection;
}
public override bool SupportsDynamicPartitions { get { return true; } }
public override IList<IEnumerator<T>> GetPartitions(int partitionCount)
{
if (partitionCount < 1) throw new ArgumentOutOfRangeException("partitionCount");
var Enumerable = GetDynamicPartitions();
return Enumerable.Range(0, partitionCount)
.Select(i => Enumerable.GetEnumerator()).ToList();
}
public override IEnumerable<T> GetDynamicPartitions()
{
return Collection.GetConsumingEnumerable();
}
}
}
I have an application which process images extracted from database. I need to process various images in parallel, and because of that i'm using .NET TPL with Tasks.
My application is a Winforms app in C#. I just have the option to choose how many processes will start and a Start button. The way that i'm doing right now is this:
private Action getBusinessAction(int numProcess) {
return () =>
{
try {
while (true) {
(new BusinessClass()).doSomeProcess(numProcess);
tokenCancelacion.ThrowIfCancellationRequested();
}
}
catch (OperationCanceledException ex) {
Console.WriteLine("Cancelled process");
}
};
}
...
for (int cnt = 0; cnt < NUM_OF_MAX_PROCESS; cnt++) {
Task.Factory.StartNew(getBusinessAction(cnt + 1), tokenCancelacion);
}
In this case, if i choose 8 as the number of processes, it starts 8 tasks which are running until application is closed. But i think that a better approach will be to start a number of tasks which calls doSomeProcess method, and then finish. But when a task finishes, i would like to start the same task or start a new instance of the task that does the same, in order to having always that number of processes running in parallel. Is there a way in TPL to achieve this?
This sounds like a good fit for TPL Dataflow's ActionBlock. You create a single block at the start. Set how many items it should process concurrently (i.e. 8) and post the items into it:
var block = new ActionBlock<Image>(
image => ProcessImage(image),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 8});
foreach (var image in GetImages())
{
block.Post(image);
}
This will take care of creating up to 8 tasks when needed and when they aren't anymore the number would go down.
When you are done you should signal the block for completion and wait for it to complete:
block.Complete();
await block.Completion;
I have a C# requirement for individually processing a 'great many' (perhaps > 100,000) records. Running this process sequentially is proving to be very slow with each record taking a good second or so to complete (with a timeout error set at 5 seconds).
I would like to try running these tasks asynchronously by using a set number of worker 'threads' (I use the term 'thread' here cautiously as I am not sure if I should be looking at a thread, or a task or something else).
I have looked at the ThreadPool, but I can't imagine it could queue the volume of requests required. My ideal pseudo code would look something like this...
public void ProcessRecords() {
SetMaxNumberOfThreads(20);
MyRecord rec;
while ((rec = GetNextRecord()) != null) {
var task = WaitForNextAvailableThreadFromPool(ProcessRecord(rec));
task.Start()
}
}
I will also need a mechanism that the processing method can report back to the parent/calling class.
Can anyone point me in the right direction with perhaps some example code?
A possible simple solution would be to use a TPL Dataflow block which is a higher abstraction over the TPL with configurations for degree of parallelism and so forth. You simply create the block (ActionBlock in this case), Post everything to it, wait asynchronously for completion and TPL Dataflow handles all the rest for you:
var block = new ActionBlock<MyRecord>(
rec => ProcessRecord(rec),
new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 20});
MyRecord rec;
while ((rec = GetNextRecord()) != null)
{
block.Post(rec);
}
block.Complete();
await block.Completion
Another benefit is that the block starts working as soon as the first record arrives and not only when all the records have been received.
If you need to report back on each record you can use a TransformBlock to do the actual processing and link an ActionBlock to it that does the updates:
var transform = new TransfromBlock<MyRecord, Report>(rec =>
{
ProcessRecord(rec);
return GenerateReport(rec);
}, new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 20});
var reporter = new ActionBlock<Report>(report =>
{
RaiseEvent(report) // Or any other mechanism...
});
transform.LinkTo(reporter, new DataflowLinkOptions { PropagateCompletion = true });
MyRecord rec;
while ((rec = GetNextRecord()) != null)
{
transform.Post(rec);
}
transform.Complete();
await transform.Completion
Have you thought about using parallel processing with Actions?
ie, create a method to process a single record, add each record method as an action into a list, and then perform a parrallel.for on the list.
Dim list As New List(Of Action)
list.Add(New Action(Sub() MyMethod(myParameter)))
Parallel.ForEach(list, Sub(t) t.Invoke())
This is in vb.net, but I think you get the gist.