Parallel computation. NET 4.0 - c#

I'm making my first steps in parallel programming. I rewrote CalculateSlots to CalculateSlotsAsync. It seams to work fine (3 times faster).
My questions are: Is it written correctly?
Do I need to use the newest async awayt pattern and if yes, how?
private void CalculateSlots(bool isCalculateAllSlots)
{
foreach (IndicatorSlot indicatorSlot in strategy.Slot)
{
if (isCalculateAllSlots || !indicatorSlot.IsCalculated)
CalculateStrategySlot(indicatorSlot.SlotNumber);
}
}
private void CalculateSlotsAsync(bool isCalculateAllSlots)
{
var tasks = new List<Task>();
foreach (IIndicatorSlot indicatorSlot in strategy.Slot)
{
if (isCalculateAllSlots || !indicatorSlot.IsCalculated)
{
IIndicatorSlot slot = indicatorSlot;
Task task = Task.Factory.StartNew(() => CalculateStrategySlot(slot.SlotNumber));
tasks.Add(task);
}
}
Task.WaitAll(tasks.ToArray());
}
Test on i7-3630QM #2.40Gh
// Executed for 96 sec.
for (int i = 0; i < 1000; i++)
CalculateSlots(true);
// Executed for 34 sec.
for (int i = 0; i < 1000; i++)
CalculateSlotsAsync(true);

For data-parallel operations, you can often simplify your implementation by using PLINQ:
strategy.Slot.AsParallel()
.Where(slot => isCalculateAllSlots || !indicatorSlot.IsCalculated)
.ForAll(slot => CalculateStrategySlot(slot.SlotNumber));
However, in your case, each item takes a relatively long time to compute, so I would recommend leaving them as tasks but marking them as LongRunning (which typically has the effect of executing them on a dedicated thread, rather than the thread pool).
Task task = Task.Factory.StartNew(() => CalculateStrategySlot(slot.SlotNumber),
TaskCreationOptions.LongRunning);
Reply: Task.WaitAll causes the calling thread – in your case, the UI thread – to block until all specified tasks have completed. (The behaviour is similar for the PLINQ ForAll.)
In order for your UI to remain responsive, you need to switch from a blocking approach to an asynchronous one. For example, suppose you have:
Task.WaitAll(tasks.ToArray());
UpdateUI(strategy.Slot); // must be called on UI thread
You can replace this with:
Task.Factory.ContinueWhenAll(tasks.ToArray(), completedTasks =>
{
// callback on UI thread
UpdateUI(strategy.Slot);
},
CancellationToken.None,
TaskContinuationOptions.None,
TaskScheduler.FromCurrentSynchronizationContext());
In practice, you'll also need to learn how to use CancellationToken to allow the user to discard the operation before it completes.

Related

C# launching task with non-async function inside

Basic overview: program should launch task to parse some array of data and occasionally enqueue tasks to process it one at a time. Test rig have a button an two labels to display debug info. TaskQueue is a class for SemaphoreSlim from this thread
Dispatcher dispath = Application.Current.Dispatcher;
async void Test_Click(s, e)
{
TaskQueue queue = new TaskQueue();
// Blocks thread if SimulateParse does not have await inside
await SimulateParse(queue);
//await Task.Run(() => SimulateParse(queue));
lblStatus2.Content = string.Format("Awaiting queue"));
await queue.WaitAsync(); //this is just SemaphoreSlim.WaitAsync()
lblStatus.Content = string.Format("Ready"));
lblStatus2.Content = string.Format("Ready"));
MessageBox.Show("Ok");
}
async Task SimulateParse(TaskQueue queue)
{
Random rnd = new Random();
int counter = 0; // representing some piece of data
for(int i = 0; i < 500; i++)
{
dispatch.Invoke(() => lblStatus2.Content = string.Format("Check {0}", ++counter));
Thread.Sleep(25); //no await variant
//await Task.Delay(25);
// if some condition matched - queue work
if (rnd.Next(1, 11) < 2)
{
// Blocks thread even though Enqueue() has await inside
queue.Enqueue(SimulateWork, counter);
//Task.Run(() => queue.Enqueue(SimulateWork, counter));
}
}
}
async Task SimulateWork(object par)
{
dispatch.Invoke(() => lblStatus.Content = string.Format("Working with {0}", par));
Thread.Sleep(400); //no await variant
//await Task.Delay(400);
}
It seems, that it works only if launched task have await inside itself, i.e. if you trying to launch task without await inside it, it will block current thread.
This rig will work as intended, if commented lines are used, but it looks like excessive amount of calls, also, real versions of SimulateParse and SimulateWork does not need to await anything. Main question is - what is the optimal way to launch task with non-async function inside of it? Do i just need to encase them in a Task.Run() like in commented rows?
TaskQueue is used here to run task one by one
It will run them one at a time, yes. SemaphoreSlim does have an implicit queue, but it's not strictly a FIFO-queue. Most synchronization primitives have a mostly-but-not-quite-FIFO implementation, which is Close Enough. This is because they are synchronization primitives, and not queues.
If you want an actual queue (i.e., with guaranteed FIFO order), then you should use a queue, such as TPL Dataflow or System.Threading.Channels.
if you trying to launch task without await inside it, it will block current thread.
All async methods begin executing on the current thread, as described on my blog. async does not mean "run on a different thread". If you want to run a method on a thread pool thread, then wrap that method call in Task.Run. That's a much cleaner solution than sprinkling Task.Delay throughout, and it's more efficient, too (no delays).

How to limit number of async IO tasks to database?

I have a list of id's and I want to get data for each of those id in parallel from database. My below ExecuteAsync method is called at very high throughput and for each request we have around 500 ids for which I need to extract data.
So I have got below code where I am looping around list of ids and making async calls for each of those id in parallel and it works fine.
private async Task<List<T>> ExecuteAsync<T>(IList<int> ids, IPollyPolicy policy,
Func<CancellationToken, int, Task<T>> mapper) where T : class
{
var tasks = new List<Task<T>>(ids.Count);
// invoking multiple id in parallel to get data for each id from database
for (int i = 0; i < ids.Count; i++)
{
tasks.Add(Execute(policy, ct => mapper(ct, ids[i])));
}
// wait for all id response to come back
var responses = await Task.WhenAll(tasks);
var excludeNull = new List<T>(ids.Count);
for (int i = 0; i < responses.Length; i++)
{
var response = responses[i];
if (response != null)
{
excludeNull.Add(response);
}
}
return excludeNull;
}
private async Task<T> Execute<T>(IPollyPolicy policy,
Func<CancellationToken, Task<T>> requestExecuter) where T : class
{
var response = await policy.Policy.ExecuteAndCaptureAsync(
ct => requestExecuter(ct), CancellationToken.None);
if (response.Outcome == OutcomeType.Failure)
{
if (response.FinalException != null)
{
// log error
throw response.FinalException;
}
}
return response?.Result;
}
Question:
Now as you can see I am looping all ids and making bunch of async calls to database in parallel for each id which can put lot of load on database (depending on how many request is coming). So I want to limit the number of async calls we are making to database. I modified ExecuteAsync to use Semaphore as shown below but it doesn't look like it does what I want it to do:
private async Task<List<T>> ExecuteAsync<T>(IList<int> ids, IPollyPolicy policy,
Func<CancellationToken, int, Task<T>> mapper) where T : class
{
var throttler = new SemaphoreSlim(250);
var tasks = new List<Task<T>>(ids.Count);
// invoking multiple id in parallel to get data for each id from database
for (int i = 0; i < ids.Count; i++)
{
await throttler.WaitAsync().ConfigureAwait(false);
try
{
tasks.Add(Execute(policy, ct => mapper(ct, ids[i])));
}
finally
{
throttler.Release();
}
}
// wait for all id response to come back
var responses = await Task.WhenAll(tasks);
// same excludeNull code check here
return excludeNull;
}
Does Semaphore works on Threads or Tasks? Reading it here looks like Semaphore is for Threads and SemaphoreSlim is for tasks.
Is this correct? If yes then what's the best way to fix this and limit the number of async IO tasks we make to database here.
Task is an abstraction on threads, and doesn’t necessarily create a new thread. Semaphore limits the number of threads that can access that for loop. Execute returns a Task which aren’t threads. If there’s only 1 request, there will be only 1 thread inside that for loop, even if it is asking for 500 ids. The 1 thread sends off all the async IO tasks itself.
Sort of. I would not say that tasks are related to threads at all. There are actually two kinds of tasks: a delegate task (which is kind of an abstraction of a thread), and a promise task (which has nothing to do with threads).
Regarding the SemaphoreSlim, it does limit the concurrency of a block of code (not threads).
I recently started playing with C# so my understanding is not right looks like w.r.t Threads and Tasks.
I recommend reading my async intro and best practices. Follow up with There Is No Thread if you're interested more about how threads aren't really involved.
I modified ExecuteAsync to use Semaphore as shown below but it doesn't look like it does what I want it to do
The current code is only throttling the adding of the tasks to the list, which is only done one at a time anyway. What you want to do is throttle the execution itself:
private async Task<List<T>> ExecuteAsync<T>(IList<int> ids, IPollyPolicy policy, Func<CancellationToken, int, Task<T>> mapper) where T : class
{
var throttler = new SemaphoreSlim(250);
var tasks = new List<Task<T>>(ids.Count);
// invoking multiple id in parallel to get data for each id from database
for (int i = 0; i < ids.Count; i++)
tasks.Add(ThrottledExecute(ids[i]));
// wait for all id response to come back
var responses = await Task.WhenAll(tasks);
// same excludeNull code check here
return excludeNull;
async Task<T> ThrottledExecute(int id)
{
await throttler.WaitAsync().ConfigureAwait(false);
try {
return await Execute(policy, ct => mapper(ct, id)).ConfigureAwait(false);
} finally {
throttler.Release();
}
}
}
Your colleague has probably in mind the Semaphore class, which is indeed a thread-centric throttler, with no asynchronous capabilities.
Limits the number of threads that can access a resource or pool of resources concurrently.
The SemaphoreSlim class is a lightweight alternative to Semaphore, which includes the asynchronous method WaitAsync, that makes all the difference in the world. The WaitAsync doesn't block a thread, it blocks an asynchronous workflow. Asynchronous workflows are cheap (usually less than 1000 bytes each). You can have millions of them "running" concurrently at any given moment. This is not the case with threads, because of the 1 MB of memory that each thread reserves for its stack.
As for the ExecuteAsync method, here is how you could refactor it by using the LINQ methods Select, Where, ToArray and ToList:
Update: The Polly library supports capturing and continuing on the current synchronization context, so I added a bool executeOnCurrentContext
argument to the API. I also renamed the asynchronous Execute method to ExecuteAsync, to be in par with the guidelines.
private async Task<List<T>> ExecuteAsync<T>(IList<int> ids, IPollyPolicy policy,
Func<CancellationToken, int, Task<T>> mapper,
int concurrencyLevel = 1, bool executeOnCurrentContext = false) where T : class
{
var throttler = new SemaphoreSlim(concurrencyLevel);
Task<T>[] tasks = ids.Select(async id =>
{
await throttler.WaitAsync().ConfigureAwait(executeOnCurrentContext);
try
{
return await ExecuteAsync(policy, ct => mapper(ct, id),
executeOnCurrentContext).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
}).ToArray();
T[] results = await Task.WhenAll(tasks).ConfigureAwait(false);
return results.Where(r => r != null).ToList();
}
private async Task<T> ExecuteAsync<T>(IPollyPolicy policy,
Func<CancellationToken, Task<T>> function,
bool executeOnCurrentContext = false) where T : class
{
var response = await policy.Policy.ExecuteAndCaptureAsync(
ct => executeOnCurrentContext ? function(ct) : Task.Run(() => function(ct)),
CancellationToken.None, continueOnCapturedContext: executeOnCurrentContext)
.ConfigureAwait(executeOnCurrentContext);
if (response.Outcome == OutcomeType.Failure)
{
if (response.FinalException != null)
{
ExceptionDispatchInfo.Throw(response.FinalException);
}
}
return response?.Result;
}
You are throttling the rate at which you add tasks to the list. You are not throttling the rate at which tasks are executed. To do that, you'd probably have to implement your semaphore calls inside the Execute method itself.
If you can't modify Execute, another way to do it is to poll for completed tasks, sort of like this:
for (int i = 0; i < ids.Count; i++)
{
var pendingCount = tasks.Count( t => !t.IsCompleted );
while (pendingCount >= 500) await Task.Yield();
tasks.Add(Execute(policy, ct => mapper(ct, ids[i])));
}
await Task.WhenAll( tasks );
Actually the TPL is capable to control the task execution and limit the concurrency. You can test how many parallel tasks is suitable for your use-case. No need to think about threads, TPL will manage everything fine for you.
To use limited concurrency see this answer, credits to #panagiotis-kanavos
.Net TPL: Limited Concurrency Level Task scheduler with task priority?
The example code is (even using different priorities, you can strip that):
QueuedTaskScheduler qts = new QueuedTaskScheduler(TaskScheduler.Default,4);
TaskScheduler pri0 = qts.ActivateNewQueue(priority: 0);
TaskScheduler pri1 = qts.ActivateNewQueue(priority: 1);
Task.Factory.StartNew(()=>{ },
CancellationToken.None,
TaskCreationOptions.None,
pri0);
Just throw all your tasks to the queue and with Task.WhenAll you can wait till everything is done.

Use Parallel.ForEach on method returning task - avoid Task.WaitAll

I've got a method which takes IWorkItem, starts work on it and returns related task. The method has to look like this because of external library used.
public Task WorkOn(IWorkItem workItem)
{
//...start asynchronous operation, return task
}
I want to do this work on multiple work items. I don't know how many of them will be there - maybe 1, maybe 10 000.
WorkOn method has internal pooling and may involve waiting if too many pararell executions will be reached. (like in SemaphoreSlim.Wait):
public Task WorkOn(IWorkItem workItem)
{
_semaphoreSlim.Wait();
}
My current solution is:
public void Do(params IWorkItem[] workItems)
{
var tasks = new Task[workItems.Length];
for (var i = 0; i < workItems.Length; i++)
{
tasks[i] = WorkOn(workItems[i]);
}
Task.WaitAll(tasks);
}
Question: may I use somehow Parallel.ForEach in this case? To avoid creating 10000 tasks and later wait because of WorkOn's throttling?
That actually is not that easy. You can use Parallel.ForEach to throttle the amount of tasks that are spawned. But I am unsure how that will perform/behave in your condition.
As a general rule of thumb I usually try to avoid mixing Task and Parallel.
Surely you can do something like this:
public void Do(params IWorkItem[] workItems)
{
Parallel.ForEach(workItems, (workItem) => WorkOn(workItem).Wait());
}
Under "normal" conditions this should limit your concurrency nicely.
You could also go full async-await and add some limiting to your concurrency with some tricks. But you have to do the concurrency limiting yourself in that case.
const int ConcurrencyLimit = 8;
public async Task Do(params IWorkItem[] workItems)
{
var cursor = 0;
var currentlyProcessing = new List<Task>(ConcurrencyLimit);
while (cursor < workItems.Length)
{
while (currentlyProcessing.Count < ConcurrencyLimit && cursor < workItems.Length)
{
currentlyProcessing.Add(WorkOn(workItems[cursor]));
cursor++;
}
Task finished = await Task.WhenAny(currentlyProcessing);
currentlyProcessing.Remove(finished);
}
await Task.WhenAll(currentlyProcessing);
}
As I said... a lot more complicated. But it will limit the concurrency to any value you apply as well. In addition it properly uses the async-await pattern. If you don't want non-blocking multi threading you can easily wrap this function into another function and do a blocking .Wait on the task returned by this function.
In key in this implementation is the Task.WhenAny function. This function will return one finished task in the applied list of task (wrapped by another task for the await.

C# tasks with dynamic delay

I have a function that needs to process items 3 at a time, and if the total time taken is less than x seconds, the thread should sleep for the remaining seconds before proceeding further.
So I'm doing the following:
private void ProcessItems()
{
for (int i = 0, n = items.Count; i < n; i++)
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
batch.Add(items[i]);
if (batch.Count == 3 || i >= items.Count - 3)
{
List<Task> tasks = new List<Task>(3);
foreach (Item item in batch)
tasks.Add(Task.Factory.StartNew(() => ProcessItem(item)));
Task.WaitAll(tasks.ToArray());
batch.Clear();
}
stopwatch.Stop();
int elapsed = (int)stopwatch.ElapsedMilliseconds;
int delay = (3000) - elapsed;
if (delay > 0)
Thread.Sleep(delay);
}
}
The ProcessItem function makes a webrequest and processes the response (callback). This is the function that takes a small amount of time.
However, if I understand tasks correctly, a thread can have multiple tasks. Therefore, if I sleep the thread, other tasks can be affected.
Is there a more efficient way to achieve the above, and can tasks be used within Parallel.Foreach?
Tasks run on automatically managed threads. There is nothing intrinsically wrong with blocking a thread. It is just a little wasteful.
Here is how I would implement this very cleanly:
MyItem[] items = ...;
foreach(MyItem[] itemsChunk in items.AsChunked(3)) {
Parallel.ForEach(itemsChunk, item => Process(item));
//here you can insert a delay
}
This wastes not a single thread and is trivially simple. Parallel.ForEach used the current thread to process work items as well, so it does not sit idle. You can add your delay logic as well. Implementing AsChunked is left as an exercise for the reader... This function is supposed to split a list into chunks of the given size (3). The good thing about such a helper function is that it untangles the batching logic from the important parts.
Use
Task.Delay
instead
static async Task DoSomeProcess()
{
await Task.Delay(3000);
}
You are right, Thread.Sleep would block other tasks
Yes you can pair async/await pattern with Parallel.
Your ProcessItems method can be very easily transformed into async version ProcessItemsAsync (I didn't validate the "batching" logic):
private async Task ProcessItemsAsync()
{
for (int i = 0, n = items.Count; i < n; i++)
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
batch.Add(items[i]);
if (batch.Count == 3 || i >= items.Count - 3)
{
List<Task> tasks = new List<Task>(3);
foreach (Item item in batch)
tasks.Add(Task.Run(() => ProcessItem(item)));
await Task.WhenAll(tasks.ToArray());
batch.Clear();
}
stopwatch.Stop();
int elapsed = (int)stopwatch.ElapsedMilliseconds;
int delay = (3000) - elapsed;
if (delay > 0)
await Task.Delay(delay);
}
}
The only benefit would be that you don't block the ProcessItems thread with Task.WaitAll() and Thread.Sleep(), as #usr pointed out in his answer. Whether to take this approach or Parallel.ForEach one probably depends on the running environment of your code. Async/await won't make your code run faster, but it will improve its scalability for server-side execution, because it may take less threads to run, so more clients could be served.
Note also that now ProcessItemsAsync is itself an async task, so to keep the flow of the code which calls it unchanged, you'd need to call it like this:
ProcessItemsAsync().Wait();
Which is a blocking call on its own and may kill the advantage of async we just gained. Whether you can completely eliminate blocks like this in your app or not, largely depends on the rest of the app's workflow.

Why would a long running Task still block the UI?

I'm trying to resolve a problem where my UI is being blocked and I don't understand why.
public Task AddStuff(string myID, List<string> otherIDs)
{
Action doIt = () =>
{
this.theService.AddStuff(myID, otherIDs);
};
return Task.Factory.StartNew(doIt, TaskCreationOptions.LongRunning);
}
If the list is long the call can take 30 seconds and the entire application becomes unresponsive (goes that washed out white in Windows 7).
Is there a different way to do this so it doesn't block the UI?
Edit
Ok, so there's a LOT of code around this I'm going to try to keep this pertinent. I did realize going back to the original code, that I had removed something that may have been important. Should I maybe use a different TaskScheduler than TaskScheduler.Current?
Also there are no Wait statements impeding any of this code, and the service doesn't interact with the UI.
Task.Factory.StartNew(objState =>
{
LoadAssets(objState);
}, state, this.cancellationToken, TaskCreationOptions.LongRunning, TaskScheduler.Current);
private void LoadAssets(object objState)
{
LoadAssetsState laState = (LoadAssetsState)objState;
List<string> assetIDs = new List<string>();
for (int i = 0; i < laState.AddedMediaItems.Count; i++)
{
if (laState.CancellationToken.IsCancellationRequested)
return;
string assetId = this.SelectFilesStep.AssetService.GetAssetId(laState.AddedMediaItems[i], laState.ActiveOrder.OrderID);
assetIDs.Add(assetId);
}
if (laState.CancellationToken.IsCancellationRequested)
return;
this.ApiContext.AddAssetToProduct(laState.ActiveOrder.OrderID, laState.ActiveProduct.LineID, assetIDs, laState.Quantity, laState.CancellationToken).ContinueWith(task =>
{
if (laState.CancellationToken.IsCancellationRequested)
return;
App.ApiContext.GetOrderDetails(laState.ActiveOrder.OrderID, false, laState.CancellationToken).ContinueWith(orderDetailsTask =>
{
if (laState.CancellationToken.IsCancellationRequested)
return;
this.activeOrder = orderDetailsTask.Result;
this.StandardPrintProductsStep.Synchronize(this.activeOrder);
});
});
}
public Task AddAssetToProduct(string orderID, string lineID, List<string> assetIDs, int quantity, CancellationToken? cancellationToken = null)
{
Action doIt = () =>
{
if (cancellationToken.IsCancellationRequested())
return;
this.ordersService.AddAssetToProduct(orderID, lineID, assetIDs, quantity);
};
if (cancellationToken != null)
return Task.Factory.StartNew(doIt, cancellationToken.Value, TaskCreationOptions.LongRunning, TaskScheduler.Current);
else
return Task.Factory.StartNew(doIt, TaskCreationOptions.LongRunning);
}
EDIT
I have placed break points just before and after the service call and it is the service call that is blocking the UI, as opposed to any other line.
It sounds like there is no reason this should be blocking, so I think I'm just going to break the list down if it's long and make multiple calls. I just wanted to make sure I wasn't missing something here with my Task logic.
Is there a different way to do this so it doesn't block the UI?
This call, in and of itself, should not block the UI. If, however, theService.AddStuff does some synchronization with the UI's SynchronizationContext, this could cause the UI to effectively be blocked by that call.
Otherwise, the problem is likely happening from outside of this function. For example, if you call Wait() on the task returned from this method, in a UI thread, the UI thread will be blocked until this completes.
You probably want to use TaskScheduler.Default, not TaskScheduler.Current. If this is being called within a Task that's scheduled on a TaskScheduler based on the UI thread, it will schedule itself on the UI thread.
Wish I could put formatted code in a comment, but since I don't see how, adding this snippet as an answer. This is the kind of approach I'd use to figure out whether the task is running on the UI thread or not (since you don't want it to) and have the action be something completely different (a simple thread.sleep).
var state = new object();
var cancellationTokenSource = new CancellationTokenSource();
var cancellationToken = cancellationTokenSource.Token;
var task = Task.Factory.StartNew(
objState => { Console.WriteLine ("Current thread is {0}", Thread.CurrentThread.ManagedThreadId); Thread.Sleep(30); },
state,
cancellationToken,
TaskCreationOptions.LongRunning,
TaskScheduler.Current);
task.Wait();

Categories