I want to upload potentially large batches (possibly 100s) of files to FTP, using the SSH.NET library and the Renci.SshNet.Async extensions. I need to limit the number of concurrent uploads to five, or whatever number I discover the FTP can handle.
This is my code before any limiting:
using (var sftp = new SftpClient(sftpHost, 22, sftpUser, sftpPass))
{
var tasks = new List<Task>();
try
{
sftp.Connect();
foreach (var file in Directory.EnumerateFiles(localPath, "*.xml"))
{
tasks.Add(
sftp.UploadAsync(
File.OpenRead(file), // Stream input
Path.GetFileName(file), // string path
true)); // bool canOverride
}
await Task.WhenAll(tasks);
sftp.Disconnect();
}
// trimmed catch
}
I've read about SemaphoreSlim, but I don't fully understand how it works and how it is used with TAP. This is, based on the MSDN documentation, how I would implement it.
I'm unsure if using Task.Run is the correct way to go about this, as it's I/O bound, and from what I know, Task.Run is for CPU-bound work and async/await for I/O-bound work. I also don't understand how these tasks enter (is that the correct terminology) the semaphore, as all they do is call .Release() on it.
using (var sftp = new SftpClient(sftpHost, 22, sftpUser, sftpPass))
{
var tasks = new List<Task>();
var semaphore = new SemaphoreSlim(5);
try
{
sftp.Connect();
foreach (var file in Directory.EnumerateFiles(localPath, "*.xml"))
{
tasks.Add(
Task.Run(() =>
{
sftp.UploadAsync(
File.OpenRead(file), // Stream input
Path.GetFileName(file), // string path
true)); // bool canOverride
semaphore.Release();
});
}
await Task.WhenAll(tasks);
sftp.Disconnect();
}
// trimmed catch
}
from what I know, Task.Run is for CPU-bound work
Correct.
and async/await for I/O-bound work.
No. await is a tool for adding continuations to an asynchronous operation. It doesn't care about the nature of what that asynchronous operation is. It simply makes it easier to compose asynchronous operations of any kind together.
If you want to compose several asyncrhonous operations together you do that by making an async method, using the various asynchronous operations, awaiting them when you need their results (or for them to be completed) and then use the Task form that method as its own new asynchronous operation.
In your case your new asynchronous operation simply needs to be awaiting the semaphore, uploading your file, then releasing the semaphore.
async Task UploadFile()
{
await semaphore.WaitAsync();
try
{
await sftp.UploadAsync(
File.OpenRead(file),
Path.GetFileName(file),
true));
}
finally
{
semaphore.Release();
}
}
Now you can simply call that method for each file.
Additionally, because this is such a common operation to do, you may find it worth it to create a new class to handle this logic so that you can simply make a queue, and add items to the queue, and have it handle the throttling internally, rather than replicating that mechanic everywhere you use it.
Related
Since we expect to be reading frequently, and for us to often be reading when data is already available to be consumed, should SendLoopAsync return ValueTask rather than Task, so that we can make it allocation-free?
// Caller
_ = Task.Factory.StartNew(_ => SendLoopAsync(cancellationToken), TaskCreationOptions.LongRunning, cancellationToken);
// Method
private async ValueTask SendLoopAsync(CancellationToken cancellationToken)
{
while (await _outputChannel.Reader.WaitToReadAsync(cancellationToken).ConfigureAwait(false))
{
while (_outputChannel.Reader.TryRead(out var message))
{
using (await _mutex.LockAsync(cancellationToken).ConfigureAwait(false))
{
await _clientWebSocket.SendAsync(message.Data.AsMemory(), message.MessageType, true, cancellationToken).ConfigureAwait(false);
}
}
}
}
No, there is no value at the SendLoopAsync returning a ValueTask instead of Task. This method is invoked only once in your code. The impact of avoiding a single allocation of a tiny object is practically zero. You should consider using ValueTasks for asynchronous methods that are invoked repeatedly in loops, especially in hot paths. That's not the case in the example presented in the question.
As a side note, invoking asynchronous methods with the Task.Factory.StartNew+TaskCreationOptions.LongRunning combination is pointless. The new Thread that is going to be created, will have a very short life. It's gonna be terminated immediately when the code reaches the first await of an incomplete awaitable inside the async method. Also you are getting back a nested Task<Task>, which is tricky to handle. Using Task.Run is preferable. You can read here the reasons why.
Also be aware that the Nito.AsyncEx.AsyncLock class is not optimized for memory-efficiency. Lots of allocations are happening every time the lock is acquired. If you want a low-allocation synchronization primitive that can be acquired asynchronously, your best bet currently is probably to use a Channel<object> instance, initialized with a single null value: retrieve the value to enter, store it back to release.
The idiomatic way to use channels doesn't need locks, semaphores or Task.Factory.StartNew. The typical way to use a channel is to have a method that accepts just a ChannelReader as input. If the method wants to use a Channel as output, it should create it itself and only return a ChannelReader that can be passed to other methods. By owning the channel the method knows when it can be closed.
In the questions case, the code is simple enough though. A simple await foreach should be enough:
private async ValueTask SendLoopAsync(ChannelReader<Message> reader,
CancellationToken cancellationToken)
{
await foreach (var msg in reader.ReadAllAsync(cancellationToken))
{
await _clientWebSocket.SendAsync(message.Data.AsMemory(),
message.MessageType,
true, cancellationToken);
}
}
This method doesn't need an external Task.Run or Task.Factory.New to work. To run it, just call it and store its task somewhere, but not discarded:
public MyWorker(ChannelReader<Message> reader,CancellationToken token)
{
.....
_loopTask=SendLoopAsync(reader,token);
}
This way, once the input channel completes, the code can await for _loopTask to finish processing any pending messages.
Any blocking code should run inside it with Task.Run(), eg
private async ValueTask SendLoopAsync(ChannelReader<Message> reader,
CancellationToken cancellationToken)
{
await foreach (var msg in reader.ReadAllAsync(cancellationToken))
{
var new_msg=await Task.Run(()=>SomeHeavyWork(msg),cancellationToken);
await _clientWebSocket.SendAsync(message.Data.AsMemory(),
message.MessageType,
true, cancellationToken);
}
}
Concurrent Workers
This method could be used to start multiple concurrent workers too :
var tasks=Enumerable.Range(0,dop).Select(_=>SendLoopAsync(reader,token));
_loopTask=Task.WhenAll(tasks);
...
await _loopTask;
In .NET 6, Parallel.ForEachAsync can be used to process multiple messages with less code:
private async ValueTask SendLoopAsync(ChannelReader<Message> reader,
CancellationToken cancellationToken)
{
var options=new ParallelOptions {
CancellationToke=cancellationToken,
MaxDegreeOfParallellism=4
};
var input=reader.ReadAllAsync(cancellationToken);
await Parallel.ForEachAsync(input,options,async (msg,token)=>{
var new_msg=await Task.Run(()=>SomeHeavyWork(msg),token);
await _clientWebSocket.SendAsync(message.Data.AsMemory(),
message.MessageType,
true, token);
});
}
Idiomatic Channel Producers
Instead of using a class-level channel stored in a field, create the channel inside the producer method and only return its reader. This way the producer method has control of the channel's lifecycle and can close it when it's done. That's one of the reasons a Channel can only be accessed accessed only through its Reader and Writer classes.
A method can consume a ChannelReader and return another. This allows creating methods that can be chained together into a pipeline.
A simple producer can look like this:
ChannelReader<Message> Producer(CancellationToke token)
{
var channel=Channel.CreateUnbounded<Message>();
var writer=channel.Writer;
_ = Task.Run(()=>{
while(!token.IsCancellationRequested)
{
var msg=SomeHeavyJob();
await writer.SendAsync(msg);
},token)
.ContinueWith(t=>writer.TryComplete(t));
return channel.Reader;
}
When cancellation is signaled, the worker exits or an exception is thrown, the main task exists and ContinueWith calls TryComplete on the writer with any exception that may have been thrown. That's a simple non-blocking operation so it doesn't matter what thread it runs on.
A transforming method would look like this:
ChannelReader<Msg2> Transform(ChannelReader<Msg1> input,CancellationToke token)
{
var channel=Channel.CreateUnbounded<Msg2>();
var writer=channel.Writer;
_ = Task.Run(()=>{
await foreach(var msg1 in input.ReadAllAsync(token))
{
var msg2=SomeHeavyJob(msg1);
await writer.SendAsync(msg2);
},token)
.ContinueWith(t=>writer.TryComplete(t));
return channel.Reader;
}
Turning those methods into static extension methods would allow chaining them one after the other :
var task=Producer()
.Transformer()
.Consumer();
I don't set SingleWriter because this doesn't seem to be doing anything yet. Searching for this in the .NET runtime repo on Github doesn't show any results beyond test code.
I want to call an asynchronous method multiple times in a xUnit test and wait for all calls to complete before I continue execution. I read that I can use Task.WhenAll() and Task.WaitAll() for precisely this scenario. For some reason however, the code is deadlocking.
[Fact]
public async Task GetLdapEntries_ReturnsLdapEntries()
{
var ldapEntries = _fixture.CreateMany<LdapEntryDto>(2).ToList();
var creationTasks = new List<Task>();
foreach (var led in ldapEntries)
{
var task = _attributesServiceClient.CreateLdapEntry(led);
task.Start();
creationTasks.Add(task);
}
Task.WaitAll(creationTasks.ToArray()); //<-- deadlock(?) here
//await Task.WhenAll(creationTasks);
var result = await _ldapAccess.GetLdapEntries();
result.Should().BeEquivalentTo(ldapEntries);
}
public async Task<LdapEntryDto> CreateLdapEntry(LdapEntryDto ldapEntryDto)
{
using (var creationResponse = await _httpClient.PostAsJsonAsync<LdapEntryDto>("", ldapEntryDto))
{
if (creationResponse.StatusCode == HttpStatusCode.Created)
{
return await creationResponse.Content.ReadAsAsync<LdapEntryDto>();
}
throw await buildException(creationResponse);
}
}
The system under test is a wrapper around an HttpClient that calls a web service, awaits the response, and possibly awaits reading the response's content that is finally deserialized and returned.
When I change the foreach part in the test to the following (ie, don't use Task.WhenAll() / WaitAll()), the code is running without a deadlock:
foreach (var led in ldapEntries)
{
await _attributesServiceClient.CreateLdapEntry(led);
}
What exactly is happening?
EDIT: While this question has been marked as duplicate, I don't see how the linked question relates to this one. The code examples in the link all use .Result which, as far as I understand, blocks the execution until the task has finished. In contrast, Task.WhenAll() returns a task that can be awaited and that finishes when all tasks have finished. So why is awaiting Task.WhenAll() deadlocking?
The code you posted cannot possibly have the behavior described. The first call to Task.Start would throw an InvalidOperationException, failing the test.
I read that I can use Task.WhenAll() and Task.WaitAll() for precisely this scenario.
No; to asynchronously wait on multiple tasks, you must use Task.WhenAll, not Task.WaitAll.
Example:
[Fact]
public async Task GetLdapEntries_ReturnsLdapEntries()
{
var ldapEntries = new List<int> { 0, 1 };
var creationTasks = new List<Task>();
foreach (var led in ldapEntries)
{
var task = CreateLdapEntry(led);
creationTasks.Add(task);
}
await Task.WhenAll(creationTasks);
}
public async Task<string> CreateLdapEntry(int ldapEntryDto)
{
await Task.Delay(500);
return "";
}
Task.WaitAll() will deadlock simply because it blocks the current thread while the tasks are not finished (and since you are using async/await and not threads, all of your tasks are running on the same thread, and you are not letting your awaited tasks to go back to the calling point because the thread they are running in -the same one where you called Task.WaitAll()-, is blocked).
Not sure why WhenAll is also deadlocking for you here though, it definitely shouldn't.
PS: you don't need to call Start on tasks returned by an async method: they are "hot" (already started) already upon creation
I'm getting a bit confused with Task.Run and all I read about it on the internet. So here's my case: I have some function that handles incoming socket data:
public async Task Handle(Client client)
{
while (true)
{
var data = await client.ReadAsync();
await this.ProcessData(client, data);
}
}
but this has a disadvantage that I can only read next request once I've finished processing the last one. So here's a modified version:
public async Task Handle(Client client)
{
while (true)
{
var data = await client.ReadAsync();
Task.Run(async () => {
await this.ProcessData(client, data);
});
}
}
It's a simplified version. For more advanced one I would restrict the maximum amount of parallel requests of course.
Anyway this ProcessData is mostly IO-bound (doing some calls to dbs, very light processing and sending data back to client) yet I keep reading that I should use Task.Run with CPU-bound functions.
Is that a correct usage of Task.Run for my case? If not what would be an alternative?
Conceptually, that is a fine usage of Task.Run. It's very similar to how ASP.NET dispatches requests: (asynchronously) reading a request and then dispatching (synchronous or asynchronous) work to the thread pool.
In practice, you'll want to ensure that the result of ProcessData is handled properly. In particular, you'll want to observe exceptions. As the code currently stands, any exceptions from ProcessData will be swallowed, since the task returned from Task.Run is not observed.
IMO, the cleanest way to handle per-message errors is to have your own try/catch, as such:
public async Task Handle(Client client)
{
while (true)
{
var data = await client.ReadAsync();
Task.Run(async () => {
try { await this.ProcessData(client, data); }
catch (Exception ex) {
// TODO: handle
}
});
}
}
where the // TODO: handle is the appropriate error-handling code for your application. E.g., you might send an error response on the socket, or just log-and-ignore.
I have a foreach() that loops through 15 reports and generates a PDF for each. The PDF generation process is slow (3 seconds each). But if I could generate them all concurrently with threads, maybe all 15 could be done in 4-5 seconds total. One constraint is that the function must not return until ALL pdfs have generated. Also, will 15 concurrent worker threads cause problems or instability for dotnet/windows?
Here is my pseudocode:
private void makePDFs(string path) {
string[] folders = Directory.GetDirectories(path);
foreach(string folderPath in folders) {
generatePDF(...);
}
// DO NOT RETURN UNTIL ALL PDFs HAVE BEEN GENERATED
}
}
What is the simplest way to achieve this?
The most straightforward approach is to use Parallel.ForEach:
private void makePDFs(string path)
{
string[] folders = Directory.GetDirectories(path);
Parallel.ForEach(folders, (folderPath) =>
{
generatePDF(folderPath);
};
//WILL NOT RETURN UNTIL ALL PDFs HAVE BEEN GENERATED
}
This way you avoid having to create, keep track of, and await each separate task; the TPL does it all for you.
You need to get a list of tasks and then use Task.WhenAll to wait for completion
var tasks = folders.Select(folder => Task.Run(() => generatePDF(...)));
await Task.WhenAll(tasks);
If you can't or don't want to use async/await you can use:
Task.WaitAll(tasks);
It will block current thread until all tasks are completed. So I'd recommend to use the 1st approach if you can.
You can also run your PDF generation in parallel using Parallel C# class:
Parallel.ForEach(folders, folder => generatePDF(...));
Please see this answer to choose which approach works the best for your problem.
.NET has a handy method just for this: Task.WhenAll(IEnumerable<Task>)
It will wait for all tasks in the IEnumerable to finish before continuing. It is an async method, so you need to await it.
var tasks = new List<Task>();
foreach(string folderPath in folders) {
tasks.Add(Task.Run(() => generatePdf()));
}
await Task.WhenAll(tasks);
I'm writing a Windows service and am looking for a way to execute a number of foreach loops in parallel, where each loop makes a call to an asynchronous (TAP) method. I initially tried the following code, which doesn't work because Parallel.ForEach and async/await are not compatible. Does anyone know whether there is an alternate approach that can achieve this?
Parallel.ForEach(messagesByFromNumber, async messageGroup =>
{
foreach (var message in messageGroup)
{
await message.SendAsync();
}
});
For the sake of clarity, due to the way that SendAsync() operates, each copy of the foreach loop must execute serially; in other words, the foreach loop cannot become concurrent / parallel.
There's no need to use Parallel.Foreach if your goal is to run these concurrently. Simply go over all your groups, create a task for each group that does a foreach of SendAsync, get all the tasks, and await them all at once with Task.WhenAll:
var tasks = messagesByFromNumber.Select(async messageGroup =>
{
foreach (var message in messageGroup)
{
await message.SendAsync();
}
});
await Task.WhenAll(tasks)
You can make it more cleat with AsyncEnumerator NuGet Package:
using System.Collections.Async;
await messagesByFromNumber.ParallelForEachAsync(async messageGroup =>
{
foreach (var message in messageGroup)
{
await message.SendAsync();
}
}, maxDegreeOfParallelism: 10);