I have certain objects on which certain tasks needs to be performed.On all objects all task needs to be performed. I want to employ multiple threads say N parallel threads
Say I have objects identifiers like A,B,C (Objects can be in 100 K range ; keys can be long or string)
And Tasks can T1,T2,T3,TN - (Task are max 20 in number)
Conditions for task execution -
Tasks can be executed in parallel even for the same object.
But for the same object, for a given task, it should be executed in series.
Example , say I have
Objects on which are task performed are A,B,A
and tasks are t1, t2
So T1(A), T2(A) or T1(A) , T2(B) are possible , but T1(A) and T1(A) shouldnt be allowed
How can I ensure that , that my conditions are met. I know I have to use some sort of hashing.
I read about hashing , so my hash function can be of -
return ObjectIdentifier.getHashCode() + TaskIdentifier.getHashCode()
or other can be - a^3 + b^2 (where a and b are hashes of object identifier and task identifier respectively)
What would be best strategy, any suggestions
My task doesnt involve any IO, and as of now I am using one thread for each task.
So my current design is ok, or should I try to optimize it based on num of processors. (have fixed num of threads )
You can do a Parallel.ForEach on one of the lists, and a regular foreach on the other list, for example:
Parallel.ForEach (myListOfObjects, currentObject =>
{
foreach(var task in myListOfTasks)
{
task.DoSomething(currentObject);
}
});
I must say that I really like Rufus L's answer. You have to be smart about the things you parallelise and not over-encumber your implementation with excessive thread synchronisation and memory-intensive constructs - those things diminish the benefit of parallelisation. Given the large size of the item pool and the CPU-bound nature of the work, Parallel.ForEach with a sequential inner loop should provide very reasonable performance while keeping the implementation dead simple. It's a win.
Having said that, I have a pretty trivial LINQ-based tweak to Rufus' answer which addresses your other requirement (which is for the same object, for a given task, it should be executed in series). The solution works provided that the following assumptions hold:
The order in which the tasks are executed is not significant.
The work to be performed (all combinations of task x object) is known in advance and cannot change.
(Sorry for stating the obvious) The work which you want to parallelise can be parallelised - i.e. there are no shared resources / side-effects are completely isolated.
With those assumptions in mind, consider the following:
// Cartesian product of the two sets (*objects* and *tasks*).
var workItems = objects.SelectMany(
o => tasks.Select(t => new { Object = o, Task = t })
);
// Group *work items* and materialise *work item groups*.
var workItemGroups = workItems
.GroupBy(i => i, (key, items) => items.ToArray())
.ToArray();
Parallel.ForEach(workItemGroups, workItemGroup =>
{
// Execute non-unique *task* x *object*
// combinations sequentially.
foreach (var workItem in workItemGroup)
{
workItem.Task.Execute(workItem.Object);
}
});
Note that I am not limiting the degree of parallelism in Parallel.ForEach. Since all work is CPU-bound, it will work out the best number of threads on its own.
Related
I have a function like such:
static void AddResultsToDb(IEnumerable<int> numbers)
{
foreach (int number in numbers)
{
int result = ComputeResult(number); // This takes a long time, but is thread safe.
AddResultToDb(number, result); // This is quick but not thread safe.
}
}
I could solve this problem by using, for example, Parallel.ForEach to compute the results, and then use a regular foreach to add the results to the database.
However, for educational purposes, I would like a solution that revolves around await/async. But no matter how much I read about it, I cannot wrap my mind around it. If await/async is not applicable in this context, I would like to understand why.
As others have suggested, this isn't a case of using async/await as that is for asynchrony. What you're doing is concurrency. Microsoft has a framework specifically for that and it solves this problem nicely.
So for learning purposes, you should use Microsoft's Reactive Framework (aka Rx) - NuGet System.Reactive and add using System.Reactive.Linq; - then you can do this:
static void AddResultsToDb(IEnumerable<int> numbers)
{
numbers
.ToObservable()
.SelectMany(n => Observable.Start(() => new { n, r = ComputeResult(n) }))
.Do(x => AddResultToDb(x.n, x.r))
.Wait();
}
The SelectMany/Observable.Start combination allows as many ComputeResult calls to occur as possible concurrently. The nice thing about Rx is that it then serializes the results so that only one call at a time goes to AddResultToDb.
To control the degrees of parallelism you can change the SelectMany to a Select/Merge like this:
static void AddResultsToDb(IEnumerable<int> numbers)
{
numbers
.ToObservable()
.Select(n => Observable.Start(() => new { n, r = ComputeResult(n) }))
.Merge(maxConcurrent: 2)
.Do(x => AddResultToDb(x.n, x.r))
.Wait();
}
The async and await pattern is not really suitable for your first method. It's well suited for IO Bound workloads to achieve scalability, or for frameworks that have UI's for responsiveness. It's less suited for raw CPU workloads.
However you could still get benefits from parallel processing because your first method is expensive and thread safe.
In the following example I used Parallel LINQ (PLINQ) for a fluent expression of the results without worrying about a pre-sized array / concurrent collection / locking, though you could use other TPL functionality, like Parallel.For/ForEach
// Potentially break up the workloads in parallel
// return the number and result in a ValueTuple
var results = numbers.AsParallel()
.Select(x => (number: x, result: ComputeResult(x)))
.ToList();
// iterate through the number and results and execute them serially
foreach (var (number, result) in results)
AddResultToDb(number, result);
Note : The assumption here is the order is not important
Supplemental
Your method AddResultToDb looks like it's just inserting results into a database, which is IO Bound and is worthy of async, furthermore could probably take all results at once and insert them in bulk/batch saving round trips
From Comments credit #TheodorZoulias
To preserve the order you could use the method AsOrdered, at
the cost of some performance penalty. A possible performance
improvement is to remove the ToList(), so that the results are added
to the DB concurrently with the computations.
To make the results available as fast as possible it's probably a good
idea to disable the partial buffering that happens by default, by
chaining the method
.WithMergeOptions(ParallelMergeOptions.NotBuffered) in the query
var results = numbers.AsParallel()
.Select(x => (number: x, result: ComputeResult(x)))
.WithMergeOptions(ParallelMergeOptions.NotBuffered)
.AsOrdered();
Example
Additional resources
ParallelEnumerable.AsOrdered Method
Enables treatment of a data source as if it were ordered, overriding
the default of unordered. AsOrdered may only be invoked on non-generic
sequences
ParallelEnumerable.WithMergeOptions
Sets the merge options for this query, which specify how the query
will buffer output.
ParallelMergeOptions Enum
NotBuffered Use a merge without output buffers. As soon as result elements have been computed, make that element available to the
consumer of the query.
This isn't really a case for async/await because it sounds like ComputeResult is expensive computationally, as opposed to just taking a long, indeterminate amount of time. aync/await is better for tasks you are truly waiting on. Parallel.ForEach will actually thread your workload.
If anything, AddResultToDb is what you would want to async/await - you would be waiting on an external action to complete.
Good in-depth explanation: https://stackoverflow.com/a/35485780/127257
Using Parallel.For honestly seems like the simplest solution, since your computations are likely to be CPU-bound. Async/await is better for I/O bound operations since it does not require another thread to wait for an I/O operation to complete (see there is no thread).
That being said, you can still use async/await for tasks that you put on the thread pool. So here's how you could do it.
static void AddResultToDb(int number)
{
int result = ComputeResult(number);
AddResultToDb(number, result);
}
static async Task AddResultsToDb(IEnumerable<int> numbers)
{
var tasks = numbers.Select
(
number => Task.Run( () => AddResultToDb(number) )
)
.ToList();
await Task.WhenAll(tasks);
}
I've googled this plenty but I'm afraid I don't fully understand the consequences of concurrency and parallelism.
I have about 3000 rows of database objects that each have an average of 2-4 logical data attached to them that need to be validated as a part of a search query, meaning the validation service needs to execute approx. 3*3000 times. E.g. the user has filtered on color then each row needs to validate the color and return the result. The loop cannot break when a match has been found, meaning all logical objects will always need to be evaluated (this is due to calculations of relevance and just not a match).
This is done on-demand when the user selects various properties, meaning performance is key here.
I'm currently doing this by using Parallel.ForEach but wonder if it is smarter to use async behavior instead?
Current way
var validatorService = new LogicalGroupValidatorService();
ConcurrentBag<StandardSearchResult> results = new ConcurrentBag<StandardSearchResult>();
Parallel.ForEach(searchGroups, (group) =>
{
var searchGroupResult = validatorService.ValidateLogicGroupRecursivly(
propertySearchQuery, group.StandardPropertyLogicalGroup);
result.Add(new StandardSearchResult(searchGroupResult));
});
Async example code
var validatorService = new LogicalGroupValidatorService();
List<StandardSearchResult> results = new List<StandardSearchResult>();
var tasks = new List<Task<StandardPropertyLogicalGroupSearchResult>>();
foreach (var group in searchGroups)
{
tasks.Add(validatorService.ValidateLogicGroupRecursivlyAsync(
propertySearchQuery, group.StandardPropertyLogicalGroup));
}
await Task.WhenAll(tasks);
results = tasks.Select(logicalGroupResultTask =>
new StandardSearchResult(logicalGroupResultTask.Result)).ToList();
The difference between parallel and async is this:
Parallel: Spin up multiple threads and divide the work over each thread
Async: Do the work in a non-blocking manner.
Whether this makes a difference depends on what it is that is blocking in the async-way. If you're doing work on the CPU, it's the CPU that is blocking you and therefore you will still end up with multiple threads. In case it's IO (or anything else besides the CPU, you will reuse the same thread)
For your particular example that means the following:
Parallel.ForEach => Spin up new threads for each item in the list (the nr of threads that are spun up is managed by the CLR) and execute each item on a different thread
async/await => Do this bit of work, but let me continue execution. Since you have many items, that means saying this multiple times. It depends now what the results:
If this bit of workis on the CPU, the effect is the same
Otherwise, you'll just use a single thread while the work is being done somewhere else
So I just cant grasp the concept here.
I have a Method that uses the Parallel class with the Foreach method.
But the thing I dont understand is, does it create new threads so it can run the function faster?
Let's take this as an example.
I do a normal foreach loop.
private static void DoSimpleWork()
{
foreach (var item in collection)
{
//DoWork();
}
}
What that will do is, it will take the first item in the list, assign the method DoWork(); to it and wait until it finishes. Simple, plain and works.
Now.. There are three cases I am curious about
If I do this.
Parallel.ForEach(stringList, simpleString =>
{
DoMagic(simpleString);
});
Will that split up the Foreach into let's say 4 chunks?
So what I think is happening is that it takes the first 4 lines in the list, assigns each string to each "thread" (assuming parallel creates 4 virtual threads) does the work and then starts with the next 4 in that list?
If that is wrong please correct me I really want to understand how this works.
And then we have this.
Which essentially is the same but with a new parameter
Parallel.ForEach(stringList, new ParallelOptions() { MaxDegreeOfParallelism = 32 }, simpleString =>
{
DoMagic(simpleString);
});
What I am curious about is this
new ParallelOptions() { MaxDegreeOfParallelism = 32 }
Does that mean it will take the first 32 strings from that list (if there even is that many in the list) and then do the same thing as I was talking about above?
And for the last one.
Task.Factory.StartNew(() =>
{
Parallel.ForEach(stringList, simpleString =>
{
DoMagic(simpleString);
});
});
Would that create a new task, assigning each "chunk" to it's own task?
Do not mix async code with parallel. Task is for async operations - querying a DB, reading file, awaiting some comparatively-computation-cheap operation such that your UI won't be blocked and unresponsive.
Parallel is different. That's designed for 1) multi-core systems and 2) computational-intensive operations. I won't go in details how it works, that kind of info could be found in an MS documentation. Long story short, Parallel.For most probably will make it's own decision on what exactly when and how to run. It might disobey you parameters, i.e. MaxDegreeOfParallelism or somewhat else. The whole idea is to provide the best possible parallezation, thus complete your operation as fast as possible.
Parallel.ForEach perform the equivalent of a C# foreach loop, but with each iteration executing in parallel instead of sequentially. There is no sequencing, it depends on whether the OS can find an available thread, if there is it will execute
MaxDegreeOfParallelism
By default, For and ForEach will utilize as many threads as the OS provides, so changing MaxDegreeOfParallelism from the default only limits how many concurrent tasks will be used by the application.
You do not need to modify this parameter in general but may choose to change it in advanced scenarios:
When you know that a particular algorithm you're using won't scale
beyond a certain number of cores. You can set the property to avoid
wasting cycles on additional cores.
When you're running multiple algorithms concurrently and want to
manually define how much of the system each algorithm can utilize.
When the thread pool's heuristics is unable to determine the right
number of threads to use and could end up injecting too many
threads. e.g. in long-running loop body iterations, the
thread pool might not be able to tell the difference between
reasonable progress or livelock or deadlock, and might not be able
to reclaim threads that were added to improve performance. You can set the property to ensure that you don't use more than a reasonable number of threads.
Task.StartNew is usually used when you require fine-grained control for a long-running, compute-bound task, and like what #Сергей Боголюбов mentioned, do not mix them up
It creates a new task, and that task will create threads asynchronously to run the for loop
You may find this ebook useful: http://www.albahari.com/threading/#_Introduction
does the work and then starts with the next 4 in that list?
This depends on your machine's hardware and how busy the machine's cores are with other processes/apps your CPU is working on
Does that mean it will take the first 32 strings from that list (if there even if that many in the list) and then do the same thing as I was talking about above?
No, there's is no guarantee that it will take first 32, could be less. It will vary each time you execute the same code
Task.Factory.StartNew creates a new tasks but it will not create a new one for each chunk as you expect.
Putting a Parallel.ForEach inside a new Task will not help you further reduce the time taken for the parallel tasks themselves.
I have a .NET 4.5 Single Instance WCF service which maintains a collection of items in a list which will have simultaneous concurrent readers and writers, but with far more readers than writers.
I am currently deciding on whether to use the BCL ConcurrentBag<T> or use my own custom generic ThreadSafeList class (which extends IList<T> and encapsulates the BCL ReaderWriterLockSlim as this is more suited for multiple concurrent readers).
I have found numerous performance differences when testing these implementations by simulating a concurrent scenario of 1m readers (simply running a Sum Linq query) and only 100 writers (adding items to the list).
For my performance test I have a list of tasks:
List<Task> tasks = new List<Task>();
Test 1: If I create 1m reader tasks followed by 100 writer tasks using the following code:
tasks.AddRange(Enumerable.Range(0, 1000000).Select(n => new Task(() => { temp.Where(t => t < 1000).Sum(); })).ToArray());
tasks.AddRange(Enumerable.Range(0, 100).Select(n => new Task(() => { temp.Add(n); })).ToArray());
I get the following timing results:
ConcurrentBag: ~300ms
ThreadSafeList: ~520ms
Test 2: However, if I create 1m reader tasks mixed with 100 writer tasks (whereby the list of Tasks to be executed could be {Reader,Reader,Writer,Reader,Reader,Writer etc}
foreach (var item in Enumerable.Range(0, 1000000))
{
tasks.Add(new Task(() => temp.Where(t => t < 1000).Sum()));
if (item % 10000 == 0)
tasks.Add(new Task(() => temp.Add(item)));
}
I get the following timing results:
ConcurrentBag: ~4000ms
ThreadSafeList: ~800ms
My code for getting the execution time for each test is as follows:
Stopwatch watch = new Stopwatch();
watch.Start();
tasks.ForEach(task => task.Start());
Task.WaitAll(tasks.ToArray());
watch.Stop();
Console.WriteLine("Time: {0}ms", watch.Elapsed.TotalMilliseconds);
The efficiency of ConcurrentBag in Test 1 is better and the efficiency of ConcurrentBag in Test 2 is worse than my custom implementation, therefore I’m finding it a difficult decision on which one to use.
Q1. Why are the results so different when the only thing I’m changing is the ordering of the tasks within the list?
Q2. Is there a better way to change my test to make it more fair?
Why are the results so different when the only thing I’m changing is
the ordering of the tasks within the list?
My best guess is that Test #1 does not actually read items, as there is nothing to read. The order of task execution is:
Read from shared pool 1M times and calculate sum
Write to shared pool 100 times
Your Test # 2 mixes the reads and writes and this is why, I am guessing, you get a different result.
Is there a better way to change my test to make it more fair?
Before you start tasks, try randomising order of the tasks. It might be difficult to reproduce the same result, but you may get closer to real world usage.
Ultimately, your question is about difference of optimistic concurrency (Concurrent* classes) vs pessimistic concurrency (based on a lock). As a rule of thumb, when you have low chances of simultaneous access to a shared resource prefer optimistic concurrency. When the chances of simultaneous access are high prefer pessimistic one.
I have 3 main processing threads, each of them performing operations on the values of ConcurrentDictionaries by means of Parallel.Foreach. The dictionaries vary in size from 1,000 elements to 250,000 elements
TaskFactory factory = new TaskFactory();
Task t1 = factory.StartNew(() =>
{
Parallel.ForEach(dict1.Values, item => ProcessItem(item));
});
Task t2 = factory.StartNew(() =>
{
Parallel.ForEach(dict2.Values, item => ProcessItem(item));
});
Task t3 = factory.StartNew(() =>
{
Parallel.ForEach(dict3.Values, item => ProcessItem(item));
});
t1.Wait();
t2.Wait();
t3.Wait();
I compared the performance (total execution time) of this construct with just running the Parallel.Foreach in the main thread and the performance improved a lot (the execution time was reduced approximately 5 times)
My questions are:
Is there something wrong with the
approach above? If yes, what and how
can it be improved?
What is the reason for the different execution times?
What is a good way to debug/analyze such a situation?
EDIT: To further clarify the situation: I am mocking the client calls on a WCF service, that each comes on a separate thread (the reason for the Tasks). I also tried to use ThreadPool.QueueUserWorkItem instead of Task, without a performance improvement. The objects in the dictionary have between 20 and 200 properties (just decimals and strings) and there is no I/O activity
I solved the problem by queuing the processing requests in a BlockingCollection and processing them one at the time
You're probably over-parallelizing.
You don't need to create 3 tasks if you already use a good (and balanced) parallelization inside each one of them.
Parallel.Foreach already try to use the right number of threads to exploit the full CPU potential without saturating it. And by creating other tasks having Parallel.Foreach you're probably saturating it.
(EDIT: as Henk said, they probably have some problems in coordinating the number of threads to spawn when run in parallel, and at least this leads to a bigger overhead).
Have a look here for some hints.
First of all, a Task is not a Thread.
Your Parallel.ForEach() calls are run by a scheduler that uses the ThreadPool and should try to optimize Thread usage. The ForEach applies a Partitioner. When you run these in parallel they cannot coordinate very well.
Only if there is a performance problem, consider helping with extra tasks or DegreeOfParallelism directives. And then always profile and analyze first.
An explanation of your results is difficult, it could be caused by many factors (I/O for example) but the advantage of the 'single main task' is that the scheduler has more control and the CPU and Cache are used better (locality).
The dictionaries vary widely in size and by the looks of it (given everything finishes in <5s) the amount of processing work is small. Without knowing more it's hard to say what's actually going on. How big are your dictionary items? The main thread scenario you're comparing this to looks like this right?
Parallel.ForEach(dict1.Values, item => ProcessItem(item));
Parallel.ForEach(dict2.Values, item => ProcessItem(item));
Parallel.ForEach(dict3.Values, item => ProcessItem(item));
By adding the Tasks around each ForEach your adding more overhead to manage the tasks and probably causing memory contention as dict1, dict2 and dict3 all try and be in memory and hot in cache at the same time. Remember, CPU cycles are cheap, cache misses are not.