I have a simple Metro style app that's giving me an issue with (async & await).
List<string> fileNames = new List<string>();
...
...
LoadList();
...
...
(Problem) Code that accesses the elements of the fileNames List
...
...
private async void LoadList()
{
// Code that loops through a directory and adds the
// file names to the fileNames List using GetFilesAsync()
}
The problem is that the fileNames List is accessed prematurely - before it is fully loaded with items. This is because of the async method - the program continues with the next line of code while the async method continues its processing.
How can I access the List after it is fully loaded (After the async method is done)?
Is there a way to accomplish what I'm trying to do without using async in Metro apps ?
You need the calling method to be asynchronous too - and rather than having a variable of fileNames, I'd make the LoadList method return it. So you'd have:
public async Task ProcessFiles()
{
List<string> fileNames = await LoadList();
// Now process the files
}
public async Task<List<string>> LoadList()
{
List<string> fileNames = new List<string>();
// Do stuff...
return fileNames;
}
This does mean that you need to wait for all the files to be found before you start processing them; if you want to process them as you find them you'll need to think about using a BlockingCollection of some kind. EDIT: As Stephen points out, TPL Dataflow would be a great fit here too.
Related
I have a simple API and one service. Service is reading tree structure from path from configuration. problem is tree can be rather large, so I thought I can solve this by creating collection of tasks and resolve that tasks in form of stream on ActionResult. To make things harder I need a tree as result, can not split it to different requests.
So normally I would get file tree by:
public IEnumerable<string> GetFiles()
{
var result = new List<string>();
foreach (var resource in _root)
{
this.ValidateRootFolder(resource);
result.AddRange(Directory.EnumerateFiles(resource, "*.*", SearchOption.AllDirectories));
}
return result;
}
So that is simple but can be slow if there is a giant tree, and what I am trying to do is something like:
public ConcurrentBag<Task<IEnumerable<string>>> GetFiles()
{
var tasks = new ConcurrentBag<Task<IEnumerable<string>>>();
Parallel.ForEach(_root, (resource, token) =>
{
this.ValidateRootFolder(resource);
var task = Task.Run(() => Directory.EnumerateFiles(resource, "*.*", SearchOption.AllDirectories));
tasks.Add(task);
});
return tasks;
}
And this is creating task collection, so I can execute those that on endpoint, something like:
[HttpGet, ActionName("GetFiles")]
public IActionResult GetFiles()
{
ConcurrentBag<Task> tasks = _fileService.GetFiles();
return Ok(tasks); // how to make stream out of all these files
}
So my question is how to convert this task collection as stream with result, or is it possible?
And if not is there other way to do this?
Parallelism in a web application isn't always a great idea because it uses threads that would be serving web requests otherwise. Each web request is served by a separate thread. If all cores are busy, web request will have to wait.
Parallel.ForEach will use all available cores, which means no other request will be served until either Parallel.ForEach completes or one of the worker threads is rescheduled. In most web applications that would be very bad.
One way to handle this would be to use PLINQ to enumerate all folders with a limited degree-of-parallelism and return the results as a single list:
public IEnumerable<string> GetFiles()
{
var files= _roots.AsParallel()
.WithDegreeOfParallelism(2)
.SelectMany(fld=>Directory.EnumerateFiles(fld))
.AsEnumerable();
return files;
}
or
public IEnumerable<string> GetFiles()
{
var files= _roots.AsParallel()
.WithDegreeOfParallelism(2)
.SelectMany(fld=>Directory.EnumerateFiles(fld))
.ToList();
return files;
}
The benefit over Parallel.ForEach is that PLINQ handles the collection of the partial results into the final result set.
If you want to get the files grouped by root, you could use Select and GetFiles :
public IEnumerable<string[]> GetFiles()
{
var files= _roots.AsParallel()
.WithDegreeOfParallelism(2)
.Select(fld=>Directory.GetFiles(fld))
.ToList();
}
You could also return a dictionary of files per root:
public Dictionary<string,string[]> GetFiles()
{
var files= _roots.AsParallel()
.WithDegreeOfParallelism(2)
.Select(root=>(root,files=Directory.GetFiles(fld)))
.ToDictionary(p=>p.root,p=>p.files);
}
No async
There's no Directory.EnumerateFilesAsync so there's no way to benefit from asynchronous (not parallel) enumeration. The reason is that not all OSs have async IO and even when they do, the file system drivers may not have asynchronous file enumeration.
Windows NT was asynchronous from the start, with blocking operations emulated at the API level. Windows 9x wasn't. Linux on the other hand was synchronous, with async I/O added later. Even Windows doesn't have an async directory enumeration API though, because not all file systems support this.
I have an API which needs to be run in a loop for Mass processing.
Current single API is:
public async Task<ActionResult<CombinedAddressResponse>> GetCombinedAddress(AddressRequestDto request)
We are not allowed to touch/modify the original single API. However can be run in bulk, using foreach statement. What is the best way to run this asychronously without locks?
Current Solution below is just providing a list, would this be it?
public async Task<ActionResult<List<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
var combinedAddressResponses = new List<CombinedAddressResponse>();
foreach(AddressRequestDto request in requests)
{
var newCombinedAddress = (await GetCombinedAddress(request)).Value;
combinedAddressResponses.Add(newCombinedAddress);
}
return combinedAddressResponses;
}
Update:
In debugger, it has to go to combinedAddressResponse.Result.Value
combinedAddressResponse.Value = null
and Also strangely, writing combinedAddressResponse.Result.Value gives error below "Action Result does not contain a definition for for 'Value' and no accessible extension method
I'm writing this code off the top of my head without an IDE or sleep, so please comment if I'm missing something or there's a better way.
But effectively I think you want to run all your requests at once (not sequentially) doing something like this:
public async Task<ActionResult<List<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
var combinedAddressResponses = new List<CombinedAddressResponse>(requests.Count);
var tasks = new List<Task<ActionResult<CombinedAddressResponse>>(requests.Count);
foreach (var request in requests)
{
tasks.Add(Task.Run(async () => await GetCombinedAddress(request));
}
//This waits for all the tasks to complete
await tasks.WhenAll(tasks.ToArray());
combinedAddressResponses.AddRange(tasks.Select(x => x.Result.Value));
return combinedAddressResponses;
}
looking for a way to speed things up and run in parallel thanks
What you need is "asynchronous concurrency". I use the term "concurrency" to mean "doing more than one thing at a time", and "parallel" to mean "doing more than one thing at a time using threads". Since you're on ASP.NET, you don't want to use additional threads; you'd want to use a form of concurrency that works asynchronously (which uses fewer threads). So, Parallel and Task.Run should not be parts of your solution.
The way to do asynchronous concurrency is to build a collection of tasks, and then use await Task.WhenAll. E.g.:
public async Task<ActionResult<IReadOnlyList<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
// Build the collection of tasks by doing an asynchronous operation for each request.
var tasks = requests.Select(async request =>
{
var combinedAddressResponse = await GetCombinedAdress(request);
return combinedAddressResponse.Value;
}).ToList();
// Wait for all the tasks to complete and get the results.
var results = await Task.WhenAll(tasks);
return results;
}
I'm submitting a series of select statements (queries - thousands of them) to a single database synchronously and getting back one DataTable per query (Note: This program is such that it has knowledge of the DB schema it is scanning only at run time, hence the use of DataTables). The program runs on a client machine and connects to DBs on a remote machine. It takes a long time to run so many queries. So, assuming that executing them async or in parallel will speed things up, I'm exploring TPL Dataflow (TDF). I want to use the TDF library because it seems to handle all of the concerns related to writing multi-threaded code that would otherwise need to be done by hand.
The code shown is based on http://blog.i3arnon.com/2016/05/23/tpl-dataflow/. Its minimal and is just to help me understand the basic operations of TDF. Please do know I've read many blogs and coded many iterations trying to crack this nut.
None-the-less, with this current iteration, I have one problem and a question:
Problem
The code is inside a button click method (Using a UI, a user selects a machine, a sql instance, and a database, and then kicks off the scan). The two lines with the await operator return an error at build time: The 'await' operator can only be used within an async method. Consider marking this method with the 'async' modifier and changing its return type to 'Task'. I can't change the return type of the button click method. Do I need to somehow isolate the button click method from the async-await code?
Question
Although I've found beau-coup write-ups describing the basics of TDF, I can't find an example of how to get my hands on the output that each invocation of the TransformBlock produces (i.e., a DataTable). Although I want to submit the queries async, I do need to block until all queries submitted to the TransformBlock are completed. How do I get my hands on the series of DataTables produced by the TransformBlock and block until all queries are complete?
Note: I acknowledge that I have only one block now. At a minimum, I'll be adding a cancellation block and so do need/want to use TPL.
private async Task ToolStripButtonStart_Click(object sender, EventArgs e)
{
UserInput userInput = new UserInput
{
MachineName = "gat-admin",
InstanceName = "",
DbName = "AdventureWorks2014",
};
DataAccessLayer dataAccessLayer = new DataAccessLayer(userInput.MachineName, userInput.InstanceName);
//CreateTableQueryList gets a list of all tables from the DB and returns a list of
// select statements, one per table, e.g., SELECT * from [schemaname].[tablename]
IList<String> tableQueryList = CreateTableQueryList(userInput);
// Define a block that accepts a select statement and returns a DataTable of results
// where each returned record is: schemaname + tablename + columnname + column datatype + field data
// e.g., if the select query returns one record with 5 columns, then a datatable with 5
// records (one per field) will come back
var transformBlock_SubmitTableQuery = new TransformBlock<String, Task<DataTable>>(
async tableQuery => await dataAccessLayer._SubmitSelectStatement(tableQuery),
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 2,
});
// Add items to the block and start processing
foreach (String tableQuery in tableQueryList)
{
await transformBlock_SubmitTableQuery.SendAsync(tableQuery);
}
// Enable the Cancel button and disable the Start button.
toolStripButtonStart.Enabled = false;
toolStripButtonStop.Enabled = true;
//shut down the block (no more inputs or outputs)
transformBlock_SubmitTableQuery.Complete();
//await the completion of the task that procduces the output DataTable
await transformBlock_SubmitTableQuery.Completion;
}
public async Task<DataTable> _SubmitSelectStatement(string queryString )
{
try
{
.
.
await Task.Run(() => sqlDataAdapter.Fill(dt));
// process dt into the output DataTable I need
return outputDt;
}
catch
{
throw;
}
}
The cleanest way to retrieve the output of a TransformBlock is to perform a nested loop using the methods OutputAvailableAsync and TryReceive. It is a bit verbose, so you could consider encapsulating this functionality in an extension method ToListAsync:
public static async Task<List<T>> ToListAsync<T>(this IReceivableSourceBlock<T> source,
CancellationToken cancellationToken = default)
{
ArgumentNullException.ThrowIfNull(source);
List<T> list = new();
while (await source.OutputAvailableAsync(cancellationToken).ConfigureAwait(false))
{
while (source.TryReceive(out T item))
{
list.Add(item);
}
}
Debug.Assert(source.Completion.IsCompleted);
await source.Completion.ConfigureAwait(false); // Propagate possible exception
return list;
}
Then you could use the ToListAsync method like this:
private async Task ToolStripButtonStart_Click(object sender, EventArgs e)
{
TransformBlock<string, DataTable> transformBlock = new(async query => //...
//...
transformBlock.Complete();
foreach (DataTable dataTable in await transformBlock.ToListAsync())
{
// Do something with each dataTable
}
}
Note: this ToListAsync implementation is destructive, meaning that in case of an error the consumed messages are discarded. To make it non-destructive, just remove the await source.Completion line. In this case you'll have to remember to await the Completion of the block after processing the list with the consumed messages, otherwise you won't be aware if the TransformBlock failed to process all of its input.
Alternative ways to retrieve the output of a dataflow block do exist, for example this one by dcastro uses a BufferBlock as a buffer and is slightly more performant, but personally I find the approach above to be safer and more straightforward.
Instead of waiting for the completion of the block before retrieving the output, you could also retrieve it in a streaming manner, as an IAsyncEnumerable<T> sequence:
public static async IAsyncEnumerable<T> ToAsyncEnumerable<T>(
this IReceivableSourceBlock<T> source,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
ArgumentNullException.ThrowIfNull(source);
while (await source.OutputAvailableAsync(cancellationToken).ConfigureAwait(false))
{
while (source.TryReceive(out T item))
{
yield return item;
cancellationToken.ThrowIfCancellationRequested();
}
}
Debug.Assert(source.Completion.IsCompleted);
await source.Completion.ConfigureAwait(false); // Propagate possible exception
}
This way you will be able to get your hands to each DataTable immediately after it has been cooked, without having to wait for the processing of all queries. To consume an IAsyncEnumerable<T> you simply move the await before the foreach:
await foreach (DataTable dataTable in transformBlock.ToAsyncEnumerable())
{
// Do something with each dataTable
}
Advanced: Below is a more sophisticated version of the ToListAsync method, that propagates all the errors of the underlying block, in the same direct way that are propagated by methods like the Task.WhenAll and Parallel.ForEachAsync. The original simple ToListAsync method wraps the errors in a nested AggregateException, using the Wait technique that is shown in this answer.
/// <summary>
/// Asynchronously waits for the successful completion of the specified source, and
/// returns all the received messages. In case the source completes with error,
/// the error is propagated and the received messages are discarded.
/// </summary>
public static Task<List<T>> ToListAsync<T>(this IReceivableSourceBlock<T> source,
CancellationToken cancellationToken = default)
{
ArgumentNullException.ThrowIfNull(source);
async Task<List<T>> Implementation()
{
List<T> list = new();
while (await source.OutputAvailableAsync(cancellationToken)
.ConfigureAwait(false))
while (source.TryReceive(out T item))
list.Add(item);
await source.Completion.ConfigureAwait(false);
return list;
}
return Implementation().ContinueWith(t =>
{
if (t.IsCanceled) return t;
Debug.Assert(source.Completion.IsCompleted);
if (source.Completion.IsFaulted)
{
TaskCompletionSource<List<T>> tcs = new();
tcs.SetException(source.Completion.Exception.InnerExceptions);
return tcs.Task;
}
return t;
}, default, TaskContinuationOptions.DenyChildAttach |
TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default).Unwrap();
}
.NET 6 update: A new API DataflowBlock.ReceiveAllAsync was introduced in .NET 6, with this signature:
public static IAsyncEnumerable<TOutput> ReceiveAllAsync<TOutput> (
this IReceivableSourceBlock<TOutput> source,
CancellationToken cancellationToken = default);
It is similar with the aforementioned ToAsyncEnumerable method. The important difference is that the new API does not propagate the possible exception of the consumed source block, after propagating all of its messages. This behavior is not consistent with the analogous API ReadAllAsync from the Channels library. I have reported this consistency on GitHub, and the issue is currently labeled by Microsoft as a bug.
As it turns out, to meet my requirements, TPL Dataflow is a bit overkill. I was able to meet my requirements using async/await and Task.WhenAll. I used the Microsoft How-To How to: Extend the async Walkthrough by Using Task.WhenAll (C#) as a model.
Regarding my "Problem"
My "problem" is not a problem. An event method signature (in my case, a "Start" button click method that initiates my search) can be modified to be async. In the Microsoft How-To GetURLContentsAsync solution, see the startButton_Click method signature:
private async void startButton_Click(object sender, RoutedEventArgs e)
{
.
.
}
Regarding my question
Using Task.WhenAll, I can wait for all my queries to finish then process all the outputs for use on my UI. In the Microsoft How-To GetURLContentsAsync solution, see the SumPageSizesAsync method, i.e,, the array of int named lengths is the sum of all outputs.
private async Task SumPageSizesAsync()
{
.
.
// Create a query.
IEnumerable<Task<int>> downloadTasksQuery = from url in urlList select ProcessURLAsync(url);
// Use ToArray to execute the query and start the download tasks.
Task<int>[] downloadTasks = downloadTasksQuery.ToArray();
// Await the completion of all the running tasks.
Task<int[]> whenAllTask = Task.WhenAll(downloadTasks);
int[] lengths = await whenAllTask;
.
.
}
Using Dataflow blocks properly results in both cleaner and faster code. Dataflow blocks aren't agents or tasks. They're meant to work in a pipeline of blocks, connected with LinkTo calls, not manual coding.
It seems the scenario is to download some data, eg some CSVs, parse them and insert them to a database. Each of those steps can go into its own block:
a Downloader with a DOP>1, to allow multiple downloads run concurrently without flooding the network.
a Parser that converts the files into arrays of objects
an Importer that uses SqlBulkCopy to bulk insert the rows into the database in the fastest way possible, using minimal logging.
var downloadDOP=8;
var parseDOP=2;
var tableName="SomeTable";
var linkOptions=new DataflowLinkOptions { PropagateCompletion = true};
var downloadOptions =new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism = downloadDOP,
};
var parseOptions =new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism = parseDOP,
};
With these options, we can construct a pipeline of blocks
//HttpClient is thread-safe and reusable
HttpClient http=new HttpClient(...);
var downloader=new TransformBlock<(Uri,string),FileInfo>(async (uri,path)=>{
var file=new FileInfo(path);
using var stream =await httpClient.GetStreamAsync(uri);
using var fileStream=file.Create();
await stream.CopyToAsync(stream);
return file;
},downloadOptions);
var parser=new TransformBlock<FileInfo,Foo[]>(async file=>{
using var reader = file.OpenText();
using var csv = new CsvReader(reader, CultureInfo.InvariantCulture);
var records = csv.GetRecords<Foo>().ToList();
return records;
},parseOptions);
var importer=new ActionBlock<Foo[]>(async recs=>{
using var bcp=new SqlBulkCopy(connectionString, SqlBulkCopyOptions.TableLock);
bcp.DestinationTableName=tableName;
//Map columns if needed
...
using var reader=ObjectReader.Create(recs);
await bcp.WriteToServerAsync(reader);
});
downloader.LinkTo(parser,linkOptions);
parser.LinkTo(importer,linkOptions);
Once the pipeline is complete, you can start posting Uris to the head block and await until the tail block completes :
IEnumerable<(Uri,string)> filesToDownload = ...
foreach(var pair in filesToDownload)
{
await downloader.SendAsync(pair);
}
downloader.Complete();
await importer.Completion;
The code uses CsvHelper to parse the CSV file and FastMember's ObjectReader to create an IDataReader wrapper over the CSV records.
In each block you can use a Progress instance to update the UI based on the pipeline's progress
I have a foreach() that loops through 15 reports and generates a PDF for each. The PDF generation process is slow (3 seconds each). But if I could generate them all concurrently with threads, maybe all 15 could be done in 4-5 seconds total. One constraint is that the function must not return until ALL pdfs have generated. Also, will 15 concurrent worker threads cause problems or instability for dotnet/windows?
Here is my pseudocode:
private void makePDFs(string path) {
string[] folders = Directory.GetDirectories(path);
foreach(string folderPath in folders) {
generatePDF(...);
}
// DO NOT RETURN UNTIL ALL PDFs HAVE BEEN GENERATED
}
}
What is the simplest way to achieve this?
The most straightforward approach is to use Parallel.ForEach:
private void makePDFs(string path)
{
string[] folders = Directory.GetDirectories(path);
Parallel.ForEach(folders, (folderPath) =>
{
generatePDF(folderPath);
};
//WILL NOT RETURN UNTIL ALL PDFs HAVE BEEN GENERATED
}
This way you avoid having to create, keep track of, and await each separate task; the TPL does it all for you.
You need to get a list of tasks and then use Task.WhenAll to wait for completion
var tasks = folders.Select(folder => Task.Run(() => generatePDF(...)));
await Task.WhenAll(tasks);
If you can't or don't want to use async/await you can use:
Task.WaitAll(tasks);
It will block current thread until all tasks are completed. So I'd recommend to use the 1st approach if you can.
You can also run your PDF generation in parallel using Parallel C# class:
Parallel.ForEach(folders, folder => generatePDF(...));
Please see this answer to choose which approach works the best for your problem.
.NET has a handy method just for this: Task.WhenAll(IEnumerable<Task>)
It will wait for all tasks in the IEnumerable to finish before continuing. It is an async method, so you need to await it.
var tasks = new List<Task>();
foreach(string folderPath in folders) {
tasks.Add(Task.Run(() => generatePdf()));
}
await Task.WhenAll(tasks);
I have the following code that is meant to fetch data from a REST service (the Get(i) calls), then populate a matrix (not shown; this happens within addLabels()) with their relationships.
All of the Get() calls can run in parallel to each other, but they must all be finished before anything enters the second loop (where, once again, the calls can run in parallel to each other). The addLabel() calls depend on the work from the Get() calls to be complete.
** To anyone stumbling across this post, this code is the solution:**
private async void GetTypeButton_Click(object sender, RoutedEventArgs e)
{
await PokeType.InitTypes(); // initializes relationships in the matrix
var table = PokeType.EffectivenessMatrix;
// pretty-printing the table
// ...
// ...
}
private static bool initialized = false;
public static async Task InitTypes()
{
if (initialized) return;
// await blocks until first batch is finished
await Task.WhenAll(Enumerable.Range(1, NUM_TYPES /* inclusive */).Select(i => Get(i)));
// doesn't need to be parallelized because it's quick work.
foreach(PokeType type in cachedTypes.Values)
{
JObject data = type.GetJsonFromCache();
addLabels(type, (JArray)data["super_effective"], Effectiveness.SuperEffectiveAgainst);
addLabels(type, (JArray)data["ineffective"], Effectiveness.NotVeryEffectiveAgainst);
addLabels(type, (JArray)data["no_effect"], Effectiveness.UselessAgainst);
}
initialized = true;
}
public static async Task<PokeType> Get(int id);
As the code is currently written, the InitTypes() method attempts to enter both loops simultaneously; the cachedTypes dictionary is empty because the first loop hasn't finished populating it yet, so it never runs and no relationships are constructed.
How can I properly structure this function? Thanks!
Parallel and async-await don't go together well. Your async lambda expression is actually async void since Parallel.For excpects an Action<int>, which means that Parallel.For can't wait for that operation to complete.
If you're trying to call Get(i) multiple times concurrently and wait for them to complete before moving on you need to use Task.WhenAll:
await Task.WhenAll(Enumerable.Range(1, NUM_TYPES).Select(() => Get(i)))