I have this code
List<string> myList = new List<string>();
myList.AddRange(new MyClass1().Load());
myList.AddRange(new MyClass2().Load());
myList.AddRange(new MyClass3().Load());
myList.DoSomethingWithValues();
What's the best way of running an arbitrary number of Load() methods asynchronously and then ensuring DoSomethingWithValues() runs when all asynchronous threads have completed (of course without incrementing a variable every time a callback happens and waiting for == 3)
My personal favorite would be:
List<string> myList = new List<string>();
var task1 = Task.Factory.StartNew( () => new MyClass1().Load() );
var task2 = Task.Factory.StartNew( () => new MyClass2().Load() );
var task3 = Task.Factory.StartNew( () => new MyClass3().Load() );
myList.AddRange(task1.Result);
myList.AddRange(task2.Result);
myList.AddRange(task3.Result);
myList.DoSomethingWithValues();
How about PLINQ?
var loadables = new ILoadable[]
{ new MyClass1(), new MyClass2(), new MyClass3() };
var loadResults = loadables.AsParallel()
.SelectMany(l => l.Load());
myList.AddRange(loadResults);
myList.DoSomethingWithValues();
EDIT: Changed Select to SelectMany as pointed out by Reed Copsey.
Ani's conceptual solution can be written more concisely:
new ILoadable[] { new MyClass1(), new MyClass2(), new MyClass3() }
.AsParallel().SelectMany(o => o.Load()).ToList()
.DoSomethingWithValues();
That's my preferred solution: declarative (AsParallel) and concise.
Reed's solution, when written in this fashion, looks as follows:
new ILoadable[] { new MyClass1(), new MyClass2(), new MyClass3() }
.Select(o=>Task.Factory.StartNew(()=>o.Load().ToArray())).ToArray()
.SelectMany(t=>t.Result).ToList()
.DoSomethingWithValues();
Note that both ToArray calls may be necessary. The first call is necessary if o.Load is lazy (which in general it can be, though YMMV) to ensure evaluation of o.Load is completed inside the background task. The second call is necessary to ensure the list of tasks has been fully constructed before the call to SelectMany - if you don't do this, then SelectMany will attempt to iterate over its source only as necessary - i.e. it won't iterate to the second task before it has to, and that's not until the first task's Result has been computed. Effectively, you're starting tasks but then lazily executing them one after the other - turning background tasks back into a strictly sequential execution.
Note that the second, less declarative solution has many more pitfalls and requires a much more thorough analysis to be sure it's correct - i.e., this is less maintainable, though still miles better than manual threading. Incidentally, you may be able to get away with leaving out the calls to .ToList - that depends on the details of DoSomethingWithValues - for even better performance, whereby your final processing can access the first values as they trickle in without needing to wait for all tasks or parallel enumerables to complete. And that's even shorter to boot!
Unless there's compelling reason to try to run them all at once I'd suggest you just run them all in a single asynchronous method.
Compelling reason might be heavy disk/database IO that would mean running more than one background thread would actually allow them to run simultaneously. If most of the initialization is actually code logic, you might find that multiple threads actually result in slower performance.
Related
This sounds like an overly trivial question, and I think I am overcomplicating it because I haven't been able to find the answer for months. There are easy ways of doing this in Golang, Scala/Akka, etc but I can't seem to find anything in .NET.
What I need is an ability to have a list of Tasks that are all independent of each other, and the ability to execute them concurrently on a specified (and easily changeable) number of threads.
Basically something like:
int numberOfParallelThreads = 3; // changeable
Queue<Task> pendingTasks = GetPendingTasks(); // returns 80 items
await SomeBuiltInDotNetParallelExecutableManager.RunAllTasksWithSpecifiedConcurrency(pendingTasks, numberOfParallelThreads);
And that SomeBuiltInDotNetParallelExecutableManager would execute 80 tasks three at a time; i.e. when one finishes it draws the next one from the queue, until the queue is exhausted.
There is Task.WhenAll and Task.WaitAll, but you can't specify the max number of parallel threads in them.
Is there a built in, simple way to do this?
Parallel.ForEachAsync (or depending on actual workload it's sync counterpart - Parallel.ForEach, but it will not handle functions returning Task correctly):
IEnumerable<int> x = ...;
await Parallel.ForEachAsync(x, new ParallelOptions
{
MaxDegreeOfParallelism = 3
}, async (i, token) => await Task.Delay(i * 1000, token));
Also it is highly recommended that methods in C# return so called "hot", i.e. started tasks, so "idiomatically" Queue<Task> should be a collection of already started tasks, so you will have no control over number of them executing in parallel cause it will be controlled by ThreadPool/TaskScheduler.
And there is port of Akka to .NET - Akka.NET if you want to go down that route.
Microsoft's Reactive Framework makes this easy too:
IEnumerable<int> values = ...;
IDisposable subscription =
values
.ToObservable()
.Select(v => Observable.Defer(() => Observable.Start(() => { /* do work on each value */ })))
.Merge(3)
.Subscribe();
I have a function like such:
static void AddResultsToDb(IEnumerable<int> numbers)
{
foreach (int number in numbers)
{
int result = ComputeResult(number); // This takes a long time, but is thread safe.
AddResultToDb(number, result); // This is quick but not thread safe.
}
}
I could solve this problem by using, for example, Parallel.ForEach to compute the results, and then use a regular foreach to add the results to the database.
However, for educational purposes, I would like a solution that revolves around await/async. But no matter how much I read about it, I cannot wrap my mind around it. If await/async is not applicable in this context, I would like to understand why.
As others have suggested, this isn't a case of using async/await as that is for asynchrony. What you're doing is concurrency. Microsoft has a framework specifically for that and it solves this problem nicely.
So for learning purposes, you should use Microsoft's Reactive Framework (aka Rx) - NuGet System.Reactive and add using System.Reactive.Linq; - then you can do this:
static void AddResultsToDb(IEnumerable<int> numbers)
{
numbers
.ToObservable()
.SelectMany(n => Observable.Start(() => new { n, r = ComputeResult(n) }))
.Do(x => AddResultToDb(x.n, x.r))
.Wait();
}
The SelectMany/Observable.Start combination allows as many ComputeResult calls to occur as possible concurrently. The nice thing about Rx is that it then serializes the results so that only one call at a time goes to AddResultToDb.
To control the degrees of parallelism you can change the SelectMany to a Select/Merge like this:
static void AddResultsToDb(IEnumerable<int> numbers)
{
numbers
.ToObservable()
.Select(n => Observable.Start(() => new { n, r = ComputeResult(n) }))
.Merge(maxConcurrent: 2)
.Do(x => AddResultToDb(x.n, x.r))
.Wait();
}
The async and await pattern is not really suitable for your first method. It's well suited for IO Bound workloads to achieve scalability, or for frameworks that have UI's for responsiveness. It's less suited for raw CPU workloads.
However you could still get benefits from parallel processing because your first method is expensive and thread safe.
In the following example I used Parallel LINQ (PLINQ) for a fluent expression of the results without worrying about a pre-sized array / concurrent collection / locking, though you could use other TPL functionality, like Parallel.For/ForEach
// Potentially break up the workloads in parallel
// return the number and result in a ValueTuple
var results = numbers.AsParallel()
.Select(x => (number: x, result: ComputeResult(x)))
.ToList();
// iterate through the number and results and execute them serially
foreach (var (number, result) in results)
AddResultToDb(number, result);
Note : The assumption here is the order is not important
Supplemental
Your method AddResultToDb looks like it's just inserting results into a database, which is IO Bound and is worthy of async, furthermore could probably take all results at once and insert them in bulk/batch saving round trips
From Comments credit #TheodorZoulias
To preserve the order you could use the method AsOrdered, at
the cost of some performance penalty. A possible performance
improvement is to remove the ToList(), so that the results are added
to the DB concurrently with the computations.
To make the results available as fast as possible it's probably a good
idea to disable the partial buffering that happens by default, by
chaining the method
.WithMergeOptions(ParallelMergeOptions.NotBuffered) in the query
var results = numbers.AsParallel()
.Select(x => (number: x, result: ComputeResult(x)))
.WithMergeOptions(ParallelMergeOptions.NotBuffered)
.AsOrdered();
Example
Additional resources
ParallelEnumerable.AsOrdered Method
Enables treatment of a data source as if it were ordered, overriding
the default of unordered. AsOrdered may only be invoked on non-generic
sequences
ParallelEnumerable.WithMergeOptions
Sets the merge options for this query, which specify how the query
will buffer output.
ParallelMergeOptions Enum
NotBuffered Use a merge without output buffers. As soon as result elements have been computed, make that element available to the
consumer of the query.
This isn't really a case for async/await because it sounds like ComputeResult is expensive computationally, as opposed to just taking a long, indeterminate amount of time. aync/await is better for tasks you are truly waiting on. Parallel.ForEach will actually thread your workload.
If anything, AddResultToDb is what you would want to async/await - you would be waiting on an external action to complete.
Good in-depth explanation: https://stackoverflow.com/a/35485780/127257
Using Parallel.For honestly seems like the simplest solution, since your computations are likely to be CPU-bound. Async/await is better for I/O bound operations since it does not require another thread to wait for an I/O operation to complete (see there is no thread).
That being said, you can still use async/await for tasks that you put on the thread pool. So here's how you could do it.
static void AddResultToDb(int number)
{
int result = ComputeResult(number);
AddResultToDb(number, result);
}
static async Task AddResultsToDb(IEnumerable<int> numbers)
{
var tasks = numbers.Select
(
number => Task.Run( () => AddResultToDb(number) )
)
.ToList();
await Task.WhenAll(tasks);
}
I've googled this plenty but I'm afraid I don't fully understand the consequences of concurrency and parallelism.
I have about 3000 rows of database objects that each have an average of 2-4 logical data attached to them that need to be validated as a part of a search query, meaning the validation service needs to execute approx. 3*3000 times. E.g. the user has filtered on color then each row needs to validate the color and return the result. The loop cannot break when a match has been found, meaning all logical objects will always need to be evaluated (this is due to calculations of relevance and just not a match).
This is done on-demand when the user selects various properties, meaning performance is key here.
I'm currently doing this by using Parallel.ForEach but wonder if it is smarter to use async behavior instead?
Current way
var validatorService = new LogicalGroupValidatorService();
ConcurrentBag<StandardSearchResult> results = new ConcurrentBag<StandardSearchResult>();
Parallel.ForEach(searchGroups, (group) =>
{
var searchGroupResult = validatorService.ValidateLogicGroupRecursivly(
propertySearchQuery, group.StandardPropertyLogicalGroup);
result.Add(new StandardSearchResult(searchGroupResult));
});
Async example code
var validatorService = new LogicalGroupValidatorService();
List<StandardSearchResult> results = new List<StandardSearchResult>();
var tasks = new List<Task<StandardPropertyLogicalGroupSearchResult>>();
foreach (var group in searchGroups)
{
tasks.Add(validatorService.ValidateLogicGroupRecursivlyAsync(
propertySearchQuery, group.StandardPropertyLogicalGroup));
}
await Task.WhenAll(tasks);
results = tasks.Select(logicalGroupResultTask =>
new StandardSearchResult(logicalGroupResultTask.Result)).ToList();
The difference between parallel and async is this:
Parallel: Spin up multiple threads and divide the work over each thread
Async: Do the work in a non-blocking manner.
Whether this makes a difference depends on what it is that is blocking in the async-way. If you're doing work on the CPU, it's the CPU that is blocking you and therefore you will still end up with multiple threads. In case it's IO (or anything else besides the CPU, you will reuse the same thread)
For your particular example that means the following:
Parallel.ForEach => Spin up new threads for each item in the list (the nr of threads that are spun up is managed by the CLR) and execute each item on a different thread
async/await => Do this bit of work, but let me continue execution. Since you have many items, that means saying this multiple times. It depends now what the results:
If this bit of workis on the CPU, the effect is the same
Otherwise, you'll just use a single thread while the work is being done somewhere else
I have a task that essentially loops through a collection and does an operation on them in pairs (for int i = 0; i < limit; i+=2 etc.) And so, most suggestions I see on threading loops use some sort of foreach mechanism. But that seems a bit tricky to me, seeing as how I use this approach of operating in pairs.
So what I would want to do is essentially replace:
DoOperation(list.Take(numberToProcess));
with
Thread lowerHalf = new Thread(() => => DoOperation(list.Take(numberToProcess/2)));
Thread lowerHalf = new Thread(() => => DoOperation(list.getRange(numberToProcess/2, numberToProcess));
lowerHalf.Start();
upperHalf.Start();
And this seems to get the work done, but it's VERY slow. Every iteration is slower than the previous one, and when I debug, the Thread view shows a growing list of Threads.
But I was under the impression that Threads terminated themselves upon completion? And yes, the threads do complete. The DoOperation() method is pretty much just a for loop.
So what am I not understanding here?
Try Parallel.For It will save lot of work.
To explain pranitkothari's answer a little bit more and give a different example you can use
list.AsParallel().ForAll(delegate([ListContainingType] item) {
// do stuff to a single item here (whatever is done in DoOperation() in your code
// except applied to a single item rather than several)
});
For instance, if I had a list string, it would be
List<String> list = new List<String>();
list.AsParallel().ForAll(delegate(String item) {
// do stuff to a single item here (whatever is done in DoOperation() in your code
// except applied to a single item rather than several)
});
This will let you perform an operation for each item in the list on a separate thread. It's simpler in that it handles all the "multi-threadedness" for you.
This is a good post that explains one of the differences in them
I have a function which is along the lines of
private void DoSomethingToFeed(IFeed feed)
{
feed.SendData(); // Send data to remote server
Thread.Sleep(1000 * 60 * 5); // Sleep 5 minutes
feed.GetResults(); // Get data from remote server after it's processed it
}
I want to parallelize this, since I have lots of feeds that are all independent of each other. Based on this answer, leaving the Thread.Sleep() in there is not a good idea. I also want to wait after all the threads have spun up, until they've all had a chance to get their results.
What's the best way to handle a scenario like this?
Edit, because I accidentally left it out: I had originally considered calling this function as Parallel.ForEach(feeds, DoSomethingToFeed), but I was wondering if there was a better way to handle the sleeping when I found the answer I linked to.
Unless you have an awful lot of threads, you can keep it simple. Create all the threads. You'll get some thread creation overhead, but since the threads are basically sleeping the whole time, you won't get too much context switching.
It'll be easier to code than any other solution (unless you're using C# 5). So start with that, and improve it only if you actually see a performance problem.
I think you should take a look at the Task class in .NET. It is a nice abstraction on top of more low level threading / thread pool management.
In order to wait for all tasks to complete, you can use Task.WaitAll.
An example use of Tasks could look like:
IFeed feedOne = new SomeFeed();
IFeed feedTwo = new SomeFeed();
var t1 = Task.Factory.StartNew(() => { feedOne.SendData(); });
var t2 = Task.Factory.StartNew(() => { feedTwo.SendData(); });
// Waits for all provided tasks to finish execution
Task.WaitAll(t1, t2);
However, another solution would be using Parallel.ForEach which handles all Task creation for you and does the appropriate batching of tasks as well. A good comparison of the two approaches is given here - where it, among other good points is stated that:
Parallel.ForEach, internally, uses a Partitioner to distribute your collection into work items. It will not do one task per item, but rather batch this to lower the overhead involved.
check WaitHandle for waiting on tasks.
private void DoSomethingToFeed(IFeed feed)
{
Task.Factory.StartNew(() => feed.SendData())
.ContinueWith(_ => Delay(1000 * 60 * 5)
.ContinueWith(__ => feed.GetResults())
);
}
//http://stevenhollidge.blogspot.com/2012/06/async-taskdelay.html
Task Delay(int milliseconds)
{
var tcs = new TaskCompletionSource<object>();
new System.Threading.Timer(_ => tcs.SetResult(null)).Change(milliseconds, -1);
return tcs.Task;
}