Combining a while loop with Task.Run() in C# - c#

I'm pretty new to multithread applications in C# and I'm trying to edit my code below so that it runs on multiple threads. Right now it operates synchronously and it takes up very little cpu power. I need it to run much faster on multiple threads. My thought was starting a task for each core and then when a task finishes, allow another to take its place or something like that if it is possible.
static void Main(string[] args)
{
string connectionString = CloudConfigurationManager.GetSetting("Microsoft.ServiceBus.ConnectionString");
QueueClient Client = QueueClient.CreateFromConnectionString(connectionString, "OoplesQueue");
try
{
while (true)
{
Task.Run(() => processCalculations(Client));
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
Console.WriteLine(ex.StackTrace);
}
}
public static ConnectionMultiplexer connection;
public static IDatabase cache;
public static async Task processCalculations(QueueClient client)
{
try
{
BrokeredMessage message = await client.ReceiveAsync();
if (message != null)
{
if (connection == null || !connection.IsConnected)
{
connection = await ConnectionMultiplexer.ConnectAsync("connection,SyncTimeout=10000,ConnectTimeout=10000");
//connection = ConnectionMultiplexer.Connect("connection,SyncTimeout=10000,ConnectTimeout=10000");
}
cache = connection.GetDatabase();
string sandpKey = message.Properties["sandp"].ToString();
string dateKey = message.Properties["date"].ToString();
string symbolclassKey = message.Properties["symbolclass"].ToString();
string stockdataKey = message.Properties["stockdata"].ToString();
string stockcomparedataKey = message.Properties["stockcomparedata"].ToString();
List<StockData> sandp = cache.Get<List<StockData>>(sandpKey);
DateTime date = cache.Get<DateTime>(dateKey);
SymbolInfo symbolinfo = cache.Get<SymbolInfo>(symbolclassKey);
List<StockData> stockdata = cache.Get<List<StockData>>(stockdataKey);
List<StockMarketCompare> stockcomparedata = cache.Get<List<StockMarketCompare>>(stockcomparedataKey);
StockRating rating = performCalculations(symbolinfo, date, sandp, stockdata, stockcomparedata);
if (rating != null)
{
saveToTable(rating);
if (message.LockedUntilUtc.Minute <= 1)
{
await message.RenewLockAsync();
}
await message.CompleteAsync();
}
else
{
Console.WriteLine("Message " + message.MessageId + " Completed!");
await message.CompleteAsync();
}
}
}
catch (TimeoutException time)
{
Console.WriteLine(time.Message);
}
catch (MessageLockLostException locks)
{
Console.WriteLine(locks.Message);
}
catch (RedisConnectionException redis)
{
Console.WriteLine("Start the redis server service!");
}
catch (MessagingCommunicationException communication)
{
Console.WriteLine(communication.Message);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
Console.WriteLine(ex.StackTrace);
}
}

This looks like a classic producer-consumer pattern.
In this case, where you need concurrency combined with async IO bound operations (such as retrieving data from a Redis cache) and CPU bound operations (such as doing compute bound calculations), i'd leverage TPL Dataflow for the job.
You can use a ActionBlock<T> which is responsible for processing of a single action you pass to it. Behind the scenes, it takes care of concurrency, while you can limit it as you want by passing it an ExecutionDataflowBlockOptions.
You start off by creating the ActionBlock<BrokeredMessage>:
private static void Main(string[] args)
{
var actionBlock = new ActionBlock<BrokeredMessage>(async message =>
await ProcessCalculationsAsync(message),
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount
});
var produceMessagesTask = Task.Run(async () => await
ProduceBrokeredMessagesAsync(client,
actionBlock));
produceMessagesTask.Wait();
}
Now lets look what ProduceBrokeredMessageAsync. It simply receives your QueueClient and the ActionBlock to the the following:
private async Task ProduceBrokeredMessagesAsync(QueueClient client,
ActionBlock<BrokeredMessage> actionBlock)
{
BrokeredMessage message;
while ((message = await client.ReceiveAsync()) != null)
{
await actionBlock.SendAsync(message);
}
actionBlock.Complete();
await actionBlock.Completion;
}
What this does is while you receive messages from your QueueClient, it will asynchronously post the message to the ActionBlock, which will process those message concurrently.

Right now it operates synchronously and it takes up very little cpu power. I need it to run much faster on multiple threads.
"Multiple threads" doesn't necessarily mean "faster". That is only true if you have multiple calculations to perform that are independent of each other, and they are CPU-bound (meaning they mainly involve CPU operations, not IO operations).
Additionally, async doesn't necessarily mean multiple threads. It just means your operation is not blocking a process thread while in progress. If you're starting another thread and blocking it, then that looks like async but it really isn't. Check out this Channel 9 video: Async Library Methods Shouldn't Lie
Most of your operations in processCalculations look like they are dependent on each other; however, this part might be a potential improvement point:
List<StockData> sandp = cache.Get<List<StockData>>(sandpKey);
DateTime date = cache.Get<DateTime>(dateKey);
SymbolInfo symbolinfo = cache.Get<SymbolInfo>(symbolclassKey);
List<StockData> stockdata = cache.Get<List<StockData>>(stockdataKey);
List<StockMarketCompare> stockcomparedata = cache.Get<List<StockMarketCompare>>(stockcomparedataKey);
StockRating rating = performCalculations(symbolinfo, date, sandp, stockdata, stockcomparedata);
I'm not familiar with the API you're using but IF it includes an async equivalent of the Get method you might be able to do those IO operations asynchronously in parallel, e.g.:
var sandpTask = List<StockData> sandp = cache.GetAsync<List<StockData>>(sandpKey);
var dateTask = cache.GetAsync<DateTime>(dateKey);
var symbolinfoTask = cache.GetAsync<SymbolInfo>(symbolclassKey);
var stockdataTask = cache.GetAsync<List<StockData>>(stockdataKey);
var stockcomparedataTask = cache.GetAsync<List<StockMarketCompare>>(stockcomparedataKey);
await Task.WhenAll(sandpTask, dateTask,symbolinfoTask,
stockdataTask, stockcomparedataTask);
List<StockData> sandp = sandpTask.Result;
DateTime date = dateTask.Result;
SymbolInfo symbolinfo = symbolinfoTask.Result;
List<StockData> stockdata = stockdataTask.Result;
List<StockMarketCompare> stockcomparedata = stockcomparedataTask.Result;
StockRating rating = performCalculations(symbolinfo, date, sandp, stockdata, stockcomparedata);
Also, note that you don't need to wrap the processCalculations call in another Task since it already returns a task:
// instead of Task.Run(() => processCalculations(message));
processCalculations(message);

You need two parts:
Part 1 waits for an incoming message: ConnectAsync() this runs in a simple loop. Whenever something is received an instance of Part2 is started to process the incoming message.
Part2 runs in another thread / in the background and processes a single incoming message.
That way several instances of Part2 may run in parallel.
So your structure is like this:
while (true)
{
connection = await ConnectionMultiplexer.ConnectAsync(...);
StartProcessCalculationsInBackground(connection, ...); // return immediately
}

Related

C# .NET Parallel I/O operation (with throttling) [duplicate]

I would like to run a bunch of async tasks, with a limit on how many tasks may be pending completion at any given time.
Say you have 1000 URLs, and you only want to have 50 requests open at a time; but as soon as one request completes, you open up a connection to the next URL in the list. That way, there are always exactly 50 connections open at a time, until the URL list is exhausted.
I also want to utilize a given number of threads if possible.
I came up with an extension method, ThrottleTasksAsync that does what I want. Is there a simpler solution already out there? I would assume that this is a common scenario.
Usage:
class Program
{
static void Main(string[] args)
{
Enumerable.Range(1, 10).ThrottleTasksAsync(5, 2, async i => { Console.WriteLine(i); return i; }).Wait();
Console.WriteLine("Press a key to exit...");
Console.ReadKey(true);
}
}
Here is the code:
static class IEnumerableExtensions
{
public static async Task<Result_T[]> ThrottleTasksAsync<Enumerable_T, Result_T>(this IEnumerable<Enumerable_T> enumerable, int maxConcurrentTasks, int maxDegreeOfParallelism, Func<Enumerable_T, Task<Result_T>> taskToRun)
{
var blockingQueue = new BlockingCollection<Enumerable_T>(new ConcurrentBag<Enumerable_T>());
var semaphore = new SemaphoreSlim(maxConcurrentTasks);
// Run the throttler on a separate thread.
var t = Task.Run(() =>
{
foreach (var item in enumerable)
{
// Wait for the semaphore
semaphore.Wait();
blockingQueue.Add(item);
}
blockingQueue.CompleteAdding();
});
var taskList = new List<Task<Result_T>>();
Parallel.ForEach(IterateUntilTrue(() => blockingQueue.IsCompleted), new ParallelOptions { MaxDegreeOfParallelism = maxDegreeOfParallelism },
_ =>
{
Enumerable_T item;
if (blockingQueue.TryTake(out item, 100))
{
taskList.Add(
// Run the task
taskToRun(item)
.ContinueWith(tsk =>
{
// For effect
Thread.Sleep(2000);
// Release the semaphore
semaphore.Release();
return tsk.Result;
}
)
);
}
});
// Await all the tasks.
return await Task.WhenAll(taskList);
}
static IEnumerable<bool> IterateUntilTrue(Func<bool> condition)
{
while (!condition()) yield return true;
}
}
The method utilizes BlockingCollection and SemaphoreSlim to make it work. The throttler is run on one thread, and all the async tasks are run on the other thread. To achieve parallelism, I added a maxDegreeOfParallelism parameter that's passed to a Parallel.ForEach loop re-purposed as a while loop.
The old version was:
foreach (var master = ...)
{
var details = ...;
Parallel.ForEach(details, detail => {
// Process each detail record here
}, new ParallelOptions { MaxDegreeOfParallelism = 15 });
// Perform the final batch updates here
}
But, the thread pool gets exhausted fast, and you can't do async/await.
Bonus:
To get around the problem in BlockingCollection where an exception is thrown in Take() when CompleteAdding() is called, I'm using the TryTake overload with a timeout. If I didn't use the timeout in TryTake, it would defeat the purpose of using a BlockingCollection since TryTake won't block. Is there a better way? Ideally, there would be a TakeAsync method.
As suggested, use TPL Dataflow.
A TransformBlock<TInput, TOutput> may be what you're looking for.
You define a MaxDegreeOfParallelism to limit how many strings can be transformed (i.e., how many urls can be downloaded) in parallel. You then post urls to the block, and when you're done you tell the block you're done adding items and you fetch the responses.
var downloader = new TransformBlock<string, HttpResponse>(
url => Download(url),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 50 }
);
var buffer = new BufferBlock<HttpResponse>();
downloader.LinkTo(buffer);
foreach(var url in urls)
downloader.Post(url);
//or await downloader.SendAsync(url);
downloader.Complete();
await downloader.Completion;
IList<HttpResponse> responses;
if (buffer.TryReceiveAll(out responses))
{
//process responses
}
Note: The TransformBlock buffers both its input and output. Why, then, do we need to link it to a BufferBlock?
Because the TransformBlock won't complete until all items (HttpResponse) have been consumed, and await downloader.Completion would hang. Instead, we let the downloader forward all its output to a dedicated buffer block - then we wait for the downloader to complete, and inspect the buffer block.
Say you have 1000 URLs, and you only want to have 50 requests open at
a time; but as soon as one request completes, you open up a connection
to the next URL in the list. That way, there are always exactly 50
connections open at a time, until the URL list is exhausted.
The following simple solution has surfaced many times here on SO. It doesn't use blocking code and doesn't create threads explicitly, so it scales very well:
const int MAX_DOWNLOADS = 50;
static async Task DownloadAsync(string[] urls)
{
using (var semaphore = new SemaphoreSlim(MAX_DOWNLOADS))
using (var httpClient = new HttpClient())
{
var tasks = urls.Select(async url =>
{
await semaphore.WaitAsync();
try
{
var data = await httpClient.GetStringAsync(url);
Console.WriteLine(data);
}
finally
{
semaphore.Release();
}
});
await Task.WhenAll(tasks);
}
}
The thing is, the processing of the downloaded data should be done on a different pipeline, with a different level of parallelism, especially if it's a CPU-bound processing.
E.g., you'd probably want to have 4 threads concurrently doing the data processing (the number of CPU cores), and up to 50 pending requests for more data (which do not use threads at all). AFAICT, this is not what your code is currently doing.
That's where TPL Dataflow or Rx may come in handy as a preferred solution. Yet it is certainly possible to implement something like this with plain TPL. Note, the only blocking code here is the one doing the actual data processing inside Task.Run:
const int MAX_DOWNLOADS = 50;
const int MAX_PROCESSORS = 4;
// process data
class Processing
{
SemaphoreSlim _semaphore = new SemaphoreSlim(MAX_PROCESSORS);
HashSet<Task> _pending = new HashSet<Task>();
object _lock = new Object();
async Task ProcessAsync(string data)
{
await _semaphore.WaitAsync();
try
{
await Task.Run(() =>
{
// simuate work
Thread.Sleep(1000);
Console.WriteLine(data);
});
}
finally
{
_semaphore.Release();
}
}
public async void QueueItemAsync(string data)
{
var task = ProcessAsync(data);
lock (_lock)
_pending.Add(task);
try
{
await task;
}
catch
{
if (!task.IsCanceled && !task.IsFaulted)
throw; // not the task's exception, rethrow
// don't remove faulted/cancelled tasks from the list
return;
}
// remove successfully completed tasks from the list
lock (_lock)
_pending.Remove(task);
}
public async Task WaitForCompleteAsync()
{
Task[] tasks;
lock (_lock)
tasks = _pending.ToArray();
await Task.WhenAll(tasks);
}
}
// download data
static async Task DownloadAsync(string[] urls)
{
var processing = new Processing();
using (var semaphore = new SemaphoreSlim(MAX_DOWNLOADS))
using (var httpClient = new HttpClient())
{
var tasks = urls.Select(async (url) =>
{
await semaphore.WaitAsync();
try
{
var data = await httpClient.GetStringAsync(url);
// put the result on the processing pipeline
processing.QueueItemAsync(data);
}
finally
{
semaphore.Release();
}
});
await Task.WhenAll(tasks.ToArray());
await processing.WaitForCompleteAsync();
}
}
As requested, here's the code I ended up going with.
The work is set up in a master-detail configuration, and each master is processed as a batch. Each unit of work is queued up in this fashion:
var success = true;
// Start processing all the master records.
Master master;
while (null != (master = await StoredProcedures.ClaimRecordsAsync(...)))
{
await masterBuffer.SendAsync(master);
}
// Finished sending master records
masterBuffer.Complete();
// Now, wait for all the batches to complete.
await batchAction.Completion;
return success;
Masters are buffered one at a time to save work for other outside processes. The details for each master are dispatched for work via the masterTransform TransformManyBlock. A BatchedJoinBlock is also created to collect the details in one batch.
The actual work is done in the detailTransform TransformBlock, asynchronously, 150 at a time. BoundedCapacity is set to 300 to ensure that too many Masters don't get buffered at the beginning of the chain, while also leaving room for enough detail records to be queued to allow 150 records to be processed at one time. The block outputs an object to its targets, because it's filtered across the links depending on whether it's a Detail or Exception.
The batchAction ActionBlock collects the output from all the batches, and performs bulk database updates, error logging, etc. for each batch.
There will be several BatchedJoinBlocks, one for each master. Since each ISourceBlock is output sequentially and each batch only accepts the number of detail records associated with one master, the batches will be processed in order. Each block only outputs one group, and is unlinked on completion. Only the last batch block propagates its completion to the final ActionBlock.
The dataflow network:
// The dataflow network
BufferBlock<Master> masterBuffer = null;
TransformManyBlock<Master, Detail> masterTransform = null;
TransformBlock<Detail, object> detailTransform = null;
ActionBlock<Tuple<IList<object>, IList<object>>> batchAction = null;
// Buffer master records to enable efficient throttling.
masterBuffer = new BufferBlock<Master>(new DataflowBlockOptions { BoundedCapacity = 1 });
// Sequentially transform master records into a stream of detail records.
masterTransform = new TransformManyBlock<Master, Detail>(async masterRecord =>
{
var records = await StoredProcedures.GetObjectsAsync(masterRecord);
// Filter the master records based on some criteria here
var filteredRecords = records;
// Only propagate completion to the last batch
var propagateCompletion = masterBuffer.Completion.IsCompleted && masterTransform.InputCount == 0;
// Create a batch join block to encapsulate the results of the master record.
var batchjoinblock = new BatchedJoinBlock<object, object>(records.Count(), new GroupingDataflowBlockOptions { MaxNumberOfGroups = 1 });
// Add the batch block to the detail transform pipeline's link queue, and link the batch block to the the batch action block.
var detailLink1 = detailTransform.LinkTo(batchjoinblock.Target1, detailResult => detailResult is Detail);
var detailLink2 = detailTransform.LinkTo(batchjoinblock.Target2, detailResult => detailResult is Exception);
var batchLink = batchjoinblock.LinkTo(batchAction, new DataflowLinkOptions { PropagateCompletion = propagateCompletion });
// Unlink batchjoinblock upon completion.
// (the returned task does not need to be awaited, despite the warning.)
batchjoinblock.Completion.ContinueWith(task =>
{
detailLink1.Dispose();
detailLink2.Dispose();
batchLink.Dispose();
});
return filteredRecords;
}, new ExecutionDataflowBlockOptions { BoundedCapacity = 1 });
// Process each detail record asynchronously, 150 at a time.
detailTransform = new TransformBlock<Detail, object>(async detail => {
try
{
// Perform the action for each detail here asynchronously
await DoSomethingAsync();
return detail;
}
catch (Exception e)
{
success = false;
return e;
}
}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 150, BoundedCapacity = 300 });
// Perform the proper action for each batch
batchAction = new ActionBlock<Tuple<IList<object>, IList<object>>>(async batch =>
{
var details = batch.Item1.Cast<Detail>();
var errors = batch.Item2.Cast<Exception>();
// Do something with the batch here
}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4 });
masterBuffer.LinkTo(masterTransform, new DataflowLinkOptions { PropagateCompletion = true });
masterTransform.LinkTo(detailTransform, new DataflowLinkOptions { PropagateCompletion = true });

C# Running many async tasks the same time

I'm kinda new to async tasks.
I've a function that takes student ID and scrapes data from specific university website with the required ID.
private static HttpClient client = new HttpClient();
public static async Task<Student> ParseAsync(string departmentLink, int id, CancellationToken ct)
{
string website = string.Format(departmentLink, id);
try
{
string data;
var stream = await client.GetAsync(website, ct);
using (var reader = new StreamReader(await stream.Content.ReadAsStreamAsync(), Encoding.GetEncoding("windows-1256")))
data = reader.ReadToEnd();
//Parse data here and return Student.
} catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
And it works correctly. Sometimes though I need to run this function for a lot of students so I use the following
for(int i = ids.first; i <= ids.last; i++)
{
tasks[i - ids.first] = ParseStudentData.ParseAsync(entity.Link, i, cts.Token).ContinueWith(t =>
{
Dispatcher.Invoke(() =>
{
listview_students.Items.Add(t.Result);
//Students.Add(t.Result);
//lbl_count.Content = $"{listview_students.Items.Count}/{testerino.Length}";
});
});
}
I'm storing tasks in an array to wait for them later.
This also works finely as long as the students count is between (0, ~600?) it's kinda random.
And then for every other student that still hasn't been parsed throws A task was cancelled.
Keep in mind that, I never use the cancellation token at all.
I need to run this function on so many students it can reach ~9000 async task altogether. So what's happening?
You are basically creating a denial of service attack on the website when you are queuing up 9000 requests in such a short time frame. Not only is this causing you errors, but it could take down the website. It would be best to limit the number of concurrent requests to a more reasonable value (say 30). While there are probably several ways to do this, one that comes to mind is the following:
private async Task Test()
{
var tasks = new List<Task>();
for (int i = ids.first; i <= ids.last; i++)
{
tasks.Add(/* Do stuff */);
await WaitList(tasks, 30);
}
}
private async Task WaitList(IList<Task> tasks, int maxSize)
{
while (tasks.Count > maxSize)
{
var completed = await Task.WhenAny(tasks).ConfigureAwait(false);
tasks.Remove(completed);
}
}
Other approaches might leverage the producer/consumer pattern using .Net classes such as a BlockingCollection
This is what I ended up with based on #erdomke code:
public static async Task ForEachParallel<T>(
this IEnumerable<T> list,
Func<T, Task> action,
int dop)
{
var tasks = new List<Task>(dop);
foreach (var item in list)
{
tasks.Add(action(item));
while (tasks.Count >= dop)
{
var completed = await Task.WhenAny(tasks).ConfigureAwait(false);
tasks.Remove(completed);
}
}
// Wait for all remaining tasks.
await Task.WhenAll(tasks).ConfigureAwait(false);
}
// usage
await Enumerable
.Range(1, 500)
.ForEachParallel(i => ProcessItem(i), Environment.ProcessorCount);

Limit total concurrent tasks running [duplicate]

This question already has answers here:
How to limit the amount of concurrent async I/O operations?
(11 answers)
Closed 5 years ago.
I have a method Create which is executed whenever a new message is seen on the service bus message queue (https://azure.microsoft.com/en-us/services/service-bus/).
I am trying to limit the total number of concurrent tasks that can run in parallel for all calls of Create to 5 tasks.
In my code Parallel.ForEach does not seem to do anything.
I have tried to add a mutex/lock around the makePdfAsync() invocation like this:
mutex.WaitOne();
if(curretNumTasks < MaxTasks)
{
tasks.Add(makePdfAsync(form));
}
mutex.ReleaseMutex();
but it is extremely slow and makes the service bus throw.
How do I limit the number of concurrent tasks all invocations of Create creates?
public async Task Create(List<FormModel> forms)
{
var tasks = new List<Task>();
Parallel.ForEach(forms, new ParallelOptions { MaxDegreeOfParallelism = 5 }, form =>
{
tasks.Add(makePdfAsync(form));
});
await Task.WhenAny(Task.WhenAll(tasks), Task.Delay(TimeSpan.FromMinutes(10)));
}
public async Task makePdfAsync()
{
var message = new PdfMessageModel();
message.forms = new List<FormModel>() { form };
var retry = 10;
var uri = new Uri("http://localhost.:8007");
var json = JsonConvert.SerializeObject(message);
using (var wc = new WebClient())
{
wc.Encoding = System.Text.Encoding.UTF8;
// reconnect with delay in case process is not ready
while (true)
{
try
{
await wc.UploadStringTaskAsync(uri, json);
break;
}
catch
{
if (retry-- == 0) throw;
}
}
}
}
TL;DR. Create is a method on a class, it is called on many instances simultaneously. The concurrency is two fold; Several invocations of Create simultaneously and within each invocation of Create several tasks run concurrently.
How do I limit the total number of tasks running at any one point?
You could look at using a system wide semaphore?
for example :
var throttle = new Semaphore(5,5,"pdftaskthrottle");
if (throttle.WaitOne(5000)){
try{
//do some task / thread stuff
.....
} catch(Exception ex){
// handle
} finally {
//always remember to release the semaphore
throttle.Release();
}
} else {
//we timed out ... try again?
}
If I understand you correctly, you effectively want a producer/consumer queue with a limit of 5 tasks. BlockingCollection would be the best if that's what you're after. It has very good performance as internally it uses SemaphoreSlim to do the blocking when necessary. Also you can leverage Task together e.g. creating a BlockingCollection<Task<T>>. "C# in a nutshell" has a good section of this; see code below as a general example. Also try avoid using kernel-mode synchronisation construct like mutex if possible as they're slow (you have to pay for transiting from managed code into native code!).
class PCQueue : IDisposable
{
private BlockingCollection<Task> _taskQueue = new BlockingCollection<Task>();
public PCQueue(int workerCount)
{
for (int i = 0; i < workerCount; i++)
Task.Factory.StartNew(Consume);
}
public Task Enqueue(Action action, CancellationToken cancelToken = default(CancellationToken))
{
//! A task object can either be generated using TaskCompletionSource or instantiated directly (an unstarted or cold task!).
var task = new Task(action, cancelToken);
_taskQueue.Add(task); //? Create a cold task and enqueue it.
return task;
}
public Task<TResult> Enqueue<TResult>(Func<TResult> func, CancellationToken cancelToken = default(CancellationToken))
{
var task = new Task<TResult>(func, cancelToken);
_taskQueue.Add(task);
return task;
}
void Consume()
{
foreach (var task in _taskQueue.GetConsumingEnumerable())
{
try
{
//! We run the task synchronously on the consumer's thread.
if (!task.IsCanceled) task.RunSynchronously();
}
catch (InvalidOperationException)
{
//! Handle the unlikely event that the task is canceled in between checking whether it's canceled and running it.
// race condition!
}
}
}
public void Dispose() => _taskQueue.CompleteAdding();
}

UWP: how to optimize synchronization between WCF WebServices and SQLite through async calls

I must synchronize data coming from WCF WebServices in a SQLite database.
This synchronization represents a dozen of WebServices, that can be "grouped" in 4 categories:
"User's" Rights
"Forms" data, which can be updated from the two sides (user/server)
"Server" data, which are only updated from the server
"Views" data, wich copy locally the views of the server
Each call to the WebService is done through HttpClient:
response = await client.PostAsync(webServiceName, content);
Each WebService has its own async method where the WebService response is deserialized:
public static async Task<string> PushForm(List<KeyValuePair<string, string>> parameters)
{
var response = await JsonParser.GetJsonFromUrl(WebServiceName.PushForm.Value, parameters);
Forms forms = new Forms();
try
{
forms = JsonConvert.DeserializeObject<Forms>(response);
return response;
}
catch (Exception e)
{
throw new Exception(e.Message);
}
}
Then I have a SynchronizationService class, where I regroup the calls to WebServices by categories:
public async Task<bool> SynchronizeServerData()
{
bool result = false;
try
{
result = true;
List<Table1> tables1 = await WebServices.GetListTable1(null);
if (tables1 != null)
{
ServiceLocator.Current.GetInstance<IRepository>().DeleteAll<Table1>(true);
ServiceLocator.Current.GetInstance<IRepository>().AddAll<Table1>(tables1);
}
List<Table2> tables2 = await WebServices.GetListTable2(null);
if (tables2 != null)
{
ServiceLocator.Current.GetInstance<IRepository>().DeleteAll<Table2>(true);
ServiceLocator.Current.GetInstance<IRepository>().AddAll<Table2>(tables2);
}
List<Table3> tables3 = await WebServices.GetListTable3(null);
if (tables3 != null)
{
ServiceLocator.Current.GetInstance<IRepository>().DeleteAll<Table3>(true);
ServiceLocator.Current.GetInstance<IRepository>().AddAll<Table3>(tables3);
}
...
}
catch (Exception e)
{
result = false;
}
return result;
}
And finally, in the main ViewModel, I call each of these methods:
public async void SynchronizeData(bool firstSync)
{
IsBusy = true;
var resUsers = await _synchronisation.SynchronizeUsersRights();
var resServer = await _synchronisation.SynchronizeServerData();
var resForm = await _synchronisation.SynchronizeForms();
var resViews = await _synchronisation.SynchronizeViews();
IsBusy = false;
}
But due to the use of "await" the performance is not good.
=> I would like to know if there is an easy way to "parallelize" the calls to optimize performance? Or is it possible to separate the data recovery from the SQLite update for this?
On the face of it there appear to be opportunities to run a few things concurrently, for instance in SynchronizeData you could do this instead:
{
IsBusy = true;
Task resUsersTask = _synchronisation.SynchronizeUsersRights();
Task resServerTask = _synchronisation.SynchronizeServerData();
Task resFormTask = _synchronisation.SynchronizeForms();
Task resViewsTask = _synchronisation.SynchronizeViews();
await Task.WhenAll(resUsersTask, resServerTask, resFormTask, resViewsTask);
var resUsers = resUsersTask.Result;
var resServer = resServerTask.Result;
var resForm = resFormsTask.Result;
var resViews = resViewsTask.Result;
IsBusy = false;
}
...which would allow those 4 tasks to run concurrently.
You could do the same in SynchronizeServerData; e.g.:
result = true;
Task tables1Task = WebServices.GetListTable1(null);
Task tables2Task = WebServices.GetListTable2(null);
Task tables3Task = WebServices.GetListTable3(null);
List<Table1> tables1 = await tables1Task;
// ...
List<Table2> tables2 = await tables2Task;
// ...
List<Table3> tables3 = await tables3Task;
// ...
This would allow the 3 tasks to run concurrently as well.
How much you actually gain from this might depend on things like does SQLite allow you to do multiple concurrent requests - I don't know the answer to that offhand.
Some other comments:
public async void SynchronizeData(bool firstSync)
async void is almost always incorrect, except for event handlers and rare fire-and-forget methods, and usually async Task is what you want. Search SO for any number of good answers on why.
Then lastly, what you really should do is profile the code to see where your real bottlenecks are.

Async/Await or Task.Run in Console Application/Windows Service

I have been researching (including looking at all other SO posts on this topic) the best way to implement a (most likely) Windows Service worker that will pull items of work from a database and process them in parallel asynchronously in a 'fire-and-forget' manner in the background (the work item management will all be handled in the asynchronous method). The work items will be web service calls and database queries. There will be some throttling applied to the producer of these work items to ensure some kind of measured approach to scheduling the work. The examples below are very basic and are just there to highlight the logic of the while loop and for loop in place. Which is the ideal method or does it not matter? Is there a more appropriate/performant way of achieving this?
async/await...
private static int counter = 1;
static void Main(string[] args)
{
Console.Title = "Async";
Task.Run(() => AsyncMain());
Console.ReadLine();
}
private static async void AsyncMain()
{
while (true)
{
// Imagine calling a database to get some work items to do, in this case 5 dummy items
for (int i = 0; i < 5; i++)
{
var x = DoSomethingAsync(counter.ToString());
counter++;
Thread.Sleep(50);
}
Thread.Sleep(1000);
}
}
private static async Task<string> DoSomethingAsync(string jobNumber)
{
try
{
// Simulated mostly IO work - some could be long running
await Task.Delay(5000);
Console.WriteLine(jobNumber);
}
catch (Exception ex)
{
LogException(ex);
}
Log("job {0} has completed", jobNumber);
return "fire and forget so not really interested";
}
Task.Run...
private static int counter = 1;
static void Main(string[] args)
{
Console.Title = "Task";
while (true)
{
// Imagine calling a database to get some work items to do, in this case 5 dummy items
for (int i = 0; i < 5; i++)
{
var x = Task.Run(() => { DoSomethingAsync(counter.ToString()); });
counter++;
Thread.Sleep(50);
}
Thread.Sleep(1000);
}
}
private static string DoSomethingAsync(string jobNumber)
{
try
{
// Simulated mostly IO work - some could be long running
Task.Delay(5000);
Console.WriteLine(jobNumber);
}
catch (Exception ex)
{
LogException(ex);
}
Log("job {0} has completed", jobNumber);
return "fire and forget so not really interested";
}
pull items of work from a database and process them in parallel asynchronously in a 'fire-and-forget' manner in the background
Technically, you want concurrency. Whether you want asynchronous concurrency or parallel concurrency remains to be seen...
The work items will be web service calls and database queries.
The work is I/O-bound, so that implies asynchronous concurrency as the more natural approach.
There will be some throttling applied to the producer of these work items to ensure some kind of measured approach to scheduling the work.
The idea of a producer/consumer queue is implied here. That's one option. TPL Dataflow provides some nice producer/consumer queues that are async-compatible and support throttling.
Alternatively, you can do the throttling yourself. For asynchronous code, there's a built-in throttling mechanism called SemaphoreSlim.
TPL Dataflow approach, with throttling:
private static int counter = 1;
static void Main(string[] args)
{
Console.Title = "Async";
var x = Task.Run(() => MainAsync());
Console.ReadLine();
}
private static async Task MainAsync()
{
var blockOptions = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 7
};
var block = new ActionBlock<string>(DoSomethingAsync, blockOptions);
while (true)
{
var dbData = await ...; // Imagine calling a database to get some work items to do, in this case 5 dummy items
for (int i = 0; i < 5; i++)
{
block.Post(counter.ToString());
counter++;
Thread.Sleep(50);
}
Thread.Sleep(1000);
}
}
private static async Task DoSomethingAsync(string jobNumber)
{
try
{
// Simulated mostly IO work - some could be long running
await Task.Delay(5000);
Console.WriteLine(jobNumber);
}
catch (Exception ex)
{
LogException(ex);
}
Log("job {0} has completed", jobNumber);
}
Asynchronous concurrency approach with manual throttling:
private static int counter = 1;
private static SemaphoreSlim semaphore = new SemaphoreSlim(7);
static void Main(string[] args)
{
Console.Title = "Async";
var x = Task.Run(() => MainAsync());
Console.ReadLine();
}
private static async Task MainAsync()
{
while (true)
{
var dbData = await ...; // Imagine calling a database to get some work items to do, in this case 5 dummy items
for (int i = 0; i < 5; i++)
{
var x = DoSomethingAsync(counter.ToString());
counter++;
Thread.Sleep(50);
}
Thread.Sleep(1000);
}
}
private static async Task DoSomethingAsync(string jobNumber)
{
await semaphore.WaitAsync();
try
{
try
{
// Simulated mostly IO work - some could be long running
await Task.Delay(5000);
Console.WriteLine(jobNumber);
}
catch (Exception ex)
{
LogException(ex);
}
Log("job {0} has completed", jobNumber);
}
finally
{
semaphore.Release();
}
}
As a final note, I hardly ever recommend my own book on SO, but I do think it would really benefit you. In particular, sections 8.10 (Blocking/Asynchronous Queues), 11.5 (Throttling), and 4.4 (Throttling Dataflow Blocks).
First of all, let's fix some.
In the second example you are calling
Task.Delay(5000);
without await. It is a bad idea. It creates a new Task instance which runs for 5 seconds but no one is waiting for it. Task.Delay is only useful with await. Mind you, do not use Task.Delay(5000).Wait() or you are going to get deadlocked.
In your second example you are trying to make the DoSomethingAsync method synchronous, lets call it DoSomethingSync and replace the Task.Delay(5000); with Thread.Sleep(5000);
Now, the second example is almost the old-school ThreadPool.QueueUserWorkItem. And there is nothing bad with it in case you are not using some already-async API inside. Task.Run and ThreadPool.QueueUserWorkItem used in the fire-and-forget case are just the same thing. I would use the latter for clarity.
This slowly drives us to the answer to the main question. Async or not async - this is the question! I would say: "Do not create async methods in case you do not have to use some async IO inside your code". If however there is async API you have to use than the first approach would be more expected by those who are going to read your code years later.

Categories